Discover how AI, automation, and observability are transforming network operations. The Selector AI Blog shares expert perspectives, technical deep dives, and real-world insights for IT, engineering, and operations leaders.
For years, network operations have relied on complex query languages that demand specialized knowledge. Extracting insights from network data often meant writing intricate commands in formats like SQL, a skill reserved for seasoned IT professionals. But what if anyone, regardless of expertise, could ask a simple question and get immediate, accurate answers from their network? That’s precisely what Selector Copilot makes possible. With natural language querying (NLQ), teams can interact with their network the same way they interact with AI-powered chat assistants like ChatGPT or Google Gemini. Instead of struggling with syntax, they can simply ask, “What configuration changes were made in the last 24 hours?” and get a clear, actionable response. This shift is more than just a convenience. It fundamentally changes how teams troubleshoot, analyze, and manage network operations. It’s the culmination of the layered intelligence we’ve explored throughout this series, starting with harmonized, enriched data, then correlation and context, and finally LLM-powered insights and AI agents. With that foundation in place, Copilot brings it all together in the most human way possible: conversation. Complex Queries Are Hurting Your IT Efficiency Network operations generate massive amounts of data, but accessing that data traditionally requires deep technical skill. Traditional query languages like SQL force teams to spend valuable time crafting the right commands instead of focusing on solving problems. This complexity creates bottlenecks for non-technical stakeholders. Suppose an operations manager needs to check system performance trends or verify whether a config change introduced instability. In that case, they must wait for a network engineer to write the proper query. That delay slows down decision-making and response time. Even within technical teams, only a few choice experts typically know how to extract insights from the data effectively. Copilot addresses this head-on by making insights universally accessible. Network Troubleshooting Shouldn’t Require an SQL Expert Selector Copilot eliminates these barriers by automatically translating natural language into S2QL (Selector Software Query Language) queries. Instead of memorizing syntax, users can type their queries in plain text English, and Selector Copilot converts them into the appropriate commands. This capability is powered by Natural Language Translation (NLT), which ensures that questions are understood and contextualized based on your network telemetry. Whether troubleshooting a performance issue or analyzing trends, users can get the necessary information without requiring expertise in query languages. And it’s not just passive responses. Selector’s AI can also explain why something is happening. Behind the AI: How Selector Copilot Makes Network Data Instantly Accessible Selector’s Copilot uses a hybrid AI architecture combining Local and Cloud-based Large Language Models (LLMs) to ensure accuracy, performance, and security. Here’s how a natural language query flows through the system: This hybrid AI approach ensures that users receive accurate, human-readable insights in seconds, transforming complex network analysis into a seamless, intuitive experience. Why AI-Powered Cloud Models Deliver Deeper Network Intelligence While Local LLMs handle query translation and processing, Cloud LLMs take network insights to the next level by refining and visualizing data. These advanced AI models summarize key findings, identify patterns, and recommend next steps. Beyond intelligence, security remains a top priority. Customer-specific vector stores isolate sensitive network data, while enterprise-grade security measures prevent unauthorized access. Unlike public AI models, enterprise-grade Cloud LLMs like Google Gemini 1.5 Pro do not train on customer queries or responses, eliminating concerns about data leakage. By combining the speed of local AI processing with the depth of cloud-powered insights, Selector Copilot delivers a best-in-class network intelligence experience. Faster Answers, Smarter Decisions: The Impact of AI in Network Ops The ability to interact with network data using natural language improves efficiency and democratizes access to information across entire organizations. Network engineers can troubleshoot issues faster without spending time on query syntax. IT operations teams can easily identify performance trends and diagnose network slowdowns. Even executives and operations managers can retrieve key performance insights in real time without relying on IT experts. For example, instead of writing a technical query like: A user can simply ask: “What devices did User X access?” The result? Faster insights, improved collaboration, and a more efficient approach to network operations. The Future of Network Operations with AI and Natural Language AI-driven network management is evolving rapidly, transforming how teams monitor, troubleshoot, and optimize their infrastructure. As networks become more complex, organizations need solutions that reduce operational friction and streamline decision-making. Natural language querying is at the center of this shift. Soon, AI-powered assistants like Selector Copilot will go beyond answering queries, instead predicting issues before they occur, automating resolutions, and providing proactive insights. Enterprises will move from reactive troubleshooting to AI-assisted network intelligence, where problems are identified and addressed before they impact performance. Scaling natural language querying across an enterprise requires seamless integration into existing workflows. By embedding AI-powered network intelligence into collaboration tools like Slack, Microsoft Teams, and ITSM platforms, organizations can break down silos and make real-time network insights accessible across teams. The shift toward self-service analytics will empower technical and non-technical users, ensuring that everyone, from engineers to executives, can make informed decisions without relying on a handful of internal experts. For enterprises looking to stay ahead of the curve, adopting AI-driven natural language querying today is more than a convenience. It is a strategic advantage that will define the next era of network operations. Your Network Has All The Answers – Now You Can Finally Hear Them With Selector Copilot, speaking to your network is now a reality. Teams can retrieve insights, diagnose issues, and analyze trends using natural language, eliminating the need for specialized query skills. This isn’t just an incremental improvement. It’s a fundamental shift in how organizations manage their networks, enabling faster decision-making, better collaboration, and a future-proof approach to network intelligence. Now is the time to embrace AI-powered network intelligence so your team can focus on solving problems instead of writing queries. Request a demo today to see how Selector Copilot can simplify network queries and supercharge your network operations. To stay up-to-date with the latest news and blog posts
Artificial intelligence is reshaping how enterprises manage and secure their networks, but not all AI is created equal, and not all Large Language Models (LLMs) are ready for the job. While tools like ChatGPT and Google Gemini are transforming communication and productivity, applying general-purpose LLMs to something as specialized and high-stakes as network operations is an entirely different challenge. Networks are dynamic, complex, and context-heavy. They’re built on domain-specific terminology, vendor-specific configurations, and constantly shifting operational states. You can’t just drop a generic chatbot on top of a network and expect meaningful results – at least, not without serious help. Our previous post in this series explored how Selector connects the dots across telemetry sources using machine learning and correlation models. That context – unified, enriched, and real-time – is key to unlocking the next stage in AI-powered operations: safe, explainable automation through LLMs and intelligent agents. Why Generic LLMs Fall Short in Networking At their core, LLMs are pattern recognition engines trained on massive amounts of public data. They excel at predicting language, but in enterprise environments, especially networks, context matters more than language. A network operator asking, “What changed in the last 24 hours on the Chicago routers?” needs a precise, data-driven response – not a guess, a hallucination, or a generic how-to. LLMs struggle to provide relevant or accurate answers without access to real-time, domain-specific information about your network environment. They weren’t trained on your topology, logs, configurations, or business-critical thresholds. That’s where Selector’s approach stands apart. Bridging the Gap with RAG: Domain-Specific Intelligence for LLMs Selector utilizes Retrieval-Augmented Generation (RAG) to make Large Language Models (LLMs) beneficial for networking. Instead of relying on a static training set, RAG dynamically injects real-time, domain-specific context into the query process. Here’s how it works: when a user asks a question – say, “Why is there packet loss between San Jose and Atlanta?” – Selector doesn’t just send that prompt to the LLM. It first retrieves relevant logs, metrics, alerts, and event data from its unified telemetry layer. Then, that data is fed into the LLM with the original question, grounding the model’s response in your network’s actual state. This combination of contextual retrieval and natural language generation makes Selector’s Copilot truly different. It’s not just smart. It’s informed. It delivers relevant, accurate, and actionable responses because they’re rooted in your real-time environment. From Conversations to Actions: The AI Agent Framework Selector doesn’t stop at insights. Its agent framework takes things further by connecting conversational queries with automated workflows. Once the LLM has identified a likely root cause or recommended action, the system can pass that result to an intelligent agent. These agents are configured to interact with your systems of record (like ServiceNow or Jira), generate a change ticket, initiate a remediation script, or even trigger an automated fix. This unlocks a new operational model: chat-to-action. A network engineer can ask, “Restart the affected interface if error rate exceeds 10%” and the system will not only understand the intent but also validate the logic, retrieve relevant thresholds, and execute the command (or route it for approval). With this framework, Selector makes AI not just a source of insight but a trusted operational partner. Explainability Built In In high-stakes environments like networking, blind trust in AI isn’t an option. Selector understands this. Every insight generated – whether by the correlation engine, the LLM, or an AI agent – comes with a clear, traceable rationale. Users can always see what data was used, how it was processed, and why a particular answer or action was recommended. This transparency builds trust and supports human-in-the-loop operations, where engineers remain in control while AI handles the heavy lifting. Fast, Accurate, Cost-Effective Unlike many platforms that require extensive retraining or GPU-heavy infrastructure, Selector’s architecture is designed for practical, scalable deployment. Selector minimizes compute cost while maximizing responsiveness and accuracy by combining lightweight local inference (via tuned local models) with selective cloud-based LLM calls. This hybrid approach means organizations get real-world value from LLMs, without the resource drain of running massive models internally. The Future of Network Automation is Conversational LLMs are changing the way people interact with information. Selector is changing the way people interact with their networks. By combining enriched, harmonized telemetry with domain-specific LLM intelligence and an AI agent framework, Selector enables a future where asking your network for answers – or actions – is as easy as typing a question. It’s automation with context, intelligence with transparency, and AI you can actually use today. Want to see what it feels like to talk to your network? Try our free Packet Copilot and explore how natural language redefines what’s possible in network operations. And make sure to follow us on LinkedIn or X to be notified of the next post in our series, where we look at how natural language interfaces like Selector Copilot are making complex queries accessible to everyone – turning command lines into conversations.
In modern IT environments, data is abundant, but clarity is rare. Enterprises deploy dozens of monitoring tools to collect metrics, events, and logs from across the network, yet when something goes wrong, teams still scramble to connect the dots. Why? Because these data streams exist in siloes, isolated by format, source, or system. In the first blog in this series, we explored how Selector’s Data Hypervisor tackles that problem head-on – normalizing and enriching raw telemetry into a unified, context-rich data layer. It’s the essential first step in enabling AI-powered operations. But harmonized data alone doesn’t solve the problem. To truly understand what’s happening in a dynamic, multi-domain network, you need to go one step further: correlation. Each tool offers a limited view, forcing operators to conduct time-consuming investigations, switch between dashboards, review log entries, and chase alerts that may or may not be connected. The result? Slow resolution times, alert fatigue, and increased operational risk. To truly unlock the promise of AIOps, we must go beyond data collection and into data correlation—the process of transforming fragmented telemetry into meaningful, actionable context. The Problem with Isolated Alerts In a typical NOC scenario, a sudden service degradation can trigger dozens of independent alerts: one from a switch reporting high interface errors, log messages indicating that routing adjacencies have changed, another from synthetics noting a drop in performance, and additional alarms from adjacent devices. Each signal tells a piece of the story, but without context, operators are left guessing which is the root cause and which are symptoms. This is the cost of disconnected data. Legacy systems treat metrics, synthetics, logs, and events as standalone entities. They miss the relationships across time, topology, and service layers that define what’s happening. As networks become more complex and distributed, this lack of correlation worsens. From Signals to Story: How Selector Connects the Dots Selector transforms this broken workflow by delivering real-time, cross-domain correlation through the use of AI and machine learning. It doesn’t just collect telemetry. It understands it. At the heart of this capability is Selector’s Knowledge Service, which ingests enriched telemetry from the Data Hypervisor and uses a combination of recommender models and association models to uncover hidden relationships. By correlating across time series data, logs, alerts, and events, Selector tells a coherent story about what happened, when it happened, and why. Whether the source is SNMP, syslog, ThousandEyes, vManage, or another telemetry stream, Selector doesn’t treat them as isolated channels. It normalizes and enriches the data, aligns it by timestamp and topology, and uses ML models to group related anomalies into a single, actionable insight. The Power of S2QL and ML-Driven Correlation Selector’s engine is powered by the Selector Software Query Language (S2QL), a purpose-built query language that enables deep correlation across diverse data types. Combined with machine learning models trained to identify patterns, clusters, and outliers, Selector rapidly detects both known and novel issues. The result is faster MTTI, MTTD, and MTTR – because operators don’t have to waste time digging through layers of alerts to find what matters. They’re given the root cause upfront, with all supporting evidence attached. Explainable AI: Trust Built In While automation and AI are powerful, they must also be transparent and accountable. Selector builds human-in-the-loop explainability into every step. Each insight is accompanied by a clear rationale, traceable data sources, and the ability to drill down into contributing metrics, events, or logs. Not only can operators trust the result, but they can also understand how it was reached. This creates confidence in the system’s recommendations, accelerates adoption, and reduces resistance to AI-powered workflows. Real-World Impact – Without the Wait Unlike legacy platforms that require months of customization, training, or expensive GPU infrastructure, Selector delivers value in days. Its plug-and-play integrations and low-code configuration make it fast to deploy, even in large, complex environments. And because the platform is built on Kubernetes and supports SaaS and on-prem deployments, it can scale up or down depending on your organization’s needs with no disruption to existing workflows. Correlation is the Bridge to AI-Driven Automation Correlation isn’t just about better alerts. It’s the gateway to autonomous operations. When systems can understand the full context of an issue, they can begin to anticipate failures, recommend remediations, or even trigger automated workflows through integrations like Itential, ServiceNow, or PagerDuty. Selector’s architecture is built for this future. The same models that correlate anomalies today will drive predictive alerts, capacity forecasting, and self-healing actions tomorrow. If your team is drowning in alerts but starving for answers, it’s time to shift from fragmented signals to unified stories. Selector enables context-aware correlation that’s fast, explainable, and designed for modern network complexity. Schedule a demo to see how Selector can help your team move from noise to knowledge, and follow us on LinkedIn or X to be notified of the next blog in “The Path to AI-Powered Network Operations” where we will explore how Selector leverages LLMs and AI Agents to unlock the next level of intelligent automation.
Picture this: Your phone rings in the middle of the night. It’s your engineering lead, calling to inform you of a significant outage affecting your customer-facing services. As your network operations team jumps into action, they’re greeted with chaos. Over 40 alerts flood their screens simultaneously. Your network, infrastructure monitoring, and application performance monitoring tools all fire independently, each with its own dashboard and presenting data in incompatible formats. It’s like trying to solve a jigsaw puzzle blindfolded, with pieces scattered across multiple rooms and no map of how they connect. Without full-stack visibility across all layers, valuable time is lost trying to piece together the fragmented clues, which prolongs downtime and costs businesses thousands of dollars per minute. The longer it takes to identify the root cause, the longer your customers and revenue will remain impacted. In scenarios like these, disconnected data isn’t just inconvenient. It’s financially devastating. Why Disconnected Data Is Holding Back Network Operations Today’s enterprises are drowning in data but starving for insights. Operations teams, like in the example above, face the daunting challenge of managing massive volumes of telemetry from across the entire technology stack, spanning network hardware, infrastructure platforms, and distributed applications. Each system and vendor produces data in a different format, resulting in isolated information scattered across dozens of dashboards and tools. As data volumes surge, the task of troubleshooting becomes not only overwhelming but often nearly impossible. At the core, this isn’t just a complexity problem. It’s a data quality problem. Before organizations can leverage advanced technology like Artificial Intelligence for IT Operations (AIOps), they must first confront a foundational yet often overlooked challenge: data harmonization. Introducing Selector’s Data Hypervisor: Your Path to Unified Data Selector recognized early on that before AI could revolutionize network operations, enterprises first needed a more innovative, more unified way to handle their data. That’s why Selector built its unique Data Hypervisor technology – an innovative approach that transforms the way organizations ingest, enrich, and leverage network, infrastructure, and application data across all seven layers of the stack. Much like a virtualization hypervisor decouples physical hardware from the virtual machines, Selector’s Data Hypervisor decouples your diverse data sources from their native formats. The hypervisor ingests every type of operational data imaginable – logs, metrics, events, configurations, operational states, and flow data from networks, infrastructure, and applications – then automatically normalizes and enriches this data to provide a unified, vendor-agnostic view. This data normalization makes previously siloed data streams ready for advanced analytics and unified dashboards, thereby eliminating the need for costly manual correlation. But normalization is only part of the story. The Data Hypervisor also enriches incoming data with critical contextual metadata, such as labels indicating location, peer devices, circuit IDs, customer name, or application relationships, making the data more meaningful. Context transforms isolated events into actionable insights, bridging the gaps between siloed tools and datasets. How Selector Uses Machine Learning to Automate Data Enrichment Traditional methods for parsing and enriching data often depend on rigid rules and manually maintained regular expressions. This is a fragile, maintenance-intensive approach. Selector’s Data Hypervisor replaces these outdated methods with advanced machine learning models that automatically interpret and structure unstructured or semi-structured data. Rather than needing thousands of handcrafted parsing rules, Selector’s ML-driven approach quickly and accurately extracts relevant information, categorizes events, identifies anomalies, and clusters related issues. This capability drastically reduces manual overhead and error rates, enabling IT teams to shift their focus from managing data to solving actual problems. This isn’t just theoretical: Selector customers consistently achieve drastic reductions in alert noise – up to a 98% reduction in ticket volume – enabling teams to focus immediately on real issues. Laying the Foundation for AI-Driven Insights Selector’s approach to data harmonization is more than just operational convenience. It is essential groundwork for full-stack, AI-driven network operations. Studies in machine learning research emphasize that raw data preprocessing is a challenging and time-consuming task that directly impacts model performance, with a substantial portion of raw data requiring transformation before it becomes useful for AI applications. Selector’s meticulous data enrichment and normalization significantly enhance the usability of data collected from all layers, ensuring that the resulting insights and predictions are accurate, actionable, and trustworthy. Furthermore, Selector’s solution delivers immediate value. Unlike traditional approaches that require months of extensive setup, Selector can begin providing insights within days, without the need for massive infrastructure investments, such as GPUs. This rapid time-to-value, combined with cost efficiency, makes Selector not only powerful but also practical for businesses looking to make AI-driven operations a reality. What’s Next: From Unified Data to Autonomous Networks Effective AIOps isn’t just about adopting AI tools, but also about thoughtfully preparing your infrastructure to support them. Selector’s Data Hypervisor clears away the chaos, laying a robust foundation for next-level AI applications, such as automated correlation, natural language querying, conversational interfaces, and autonomous network operations. In our next blog, we’ll explore how Selector leverages machine learning to correlate network events in real time, unlocking automated insights and laying the groundwork for predictive analytics and AI-driven automation. Ready to transform your network operations? Schedule a demo today to see Selector in action, and follow us on LinkedIn or X to be notified of the next blog in this series as we continue your journey toward autonomous network management.
The 2025 Gartner Market Guide for Event Intelligence Solutions arrives at a critical time for organizations facing increasing complexity in managing IT events. Today’s diverse, distributed IT environments create significant operational challenges – alert fatigue, fragmented tools, and slow incident response – impacting both efficiency and customer experiences. Event Intelligence Solutions (EIS) address these challenges directly, employing AI and advanced analytics to simplify, accelerate, and automate event management. What Gartner’s Event Intelligence Solution Requirements Mean for Your Business Gartner identifies five mandatory features that an Event Intelligence Solution should include to overcome the most pressing challenges faced by IT and network teams: How Selector Aligns with Gartner’s Event Intelligence Solution Requirements Selector’s inclusion as a representative vendor in Gartner’s 2025 Market Guide for Event Intelligence Solutions highlights its position as an innovative leader. Through powerful, differentiated capabilities, Selector not only meets but also surpasses Gartner’s essential criteria. Unified Cross-Domain Visibility: Selector’s advanced integration capabilities ingest telemetry data seamlessly from over 300 sources. This comprehensive integration breaks down data silos and provides unprecedented visibility across the entire IT landscape. Dynamic Topology Mapping and Digital Twin: Selector dynamically visualizes and continuously updates the topology of network, infrastructure, and service dependencies. Selector’s Digital Twin also enables users to model hypothetical scenarios, predict potential failures, and optimize resource allocation, providing strategic insights beyond basic visualization ML-Driven Event Correlation and Contextual Enrichment: Selector employs sophisticated machine learning algorithms to correlate events intelligently, dramatically reducing noise by up to 95%. Selector’s advanced event correlation, root cause analysis, and smart alerting empower teams with actionable, enriched context directly within their collaboration tools, streamlining incident response. Predictive Analytics and Anomaly Detection: Selector sets itself apart through powerful predictive analytics capabilities. Its AI-driven forecasting and anomaly detection proactively identify issues, significantly minimizing incidents by anticipating disruptions before they impact services. Selector Copilot and Automated Remediation: Selector’s innovative Copilot uses advanced conversational AI and Natural Language Models (NLM) to simplify complex incident investigations through intuitive, plain-language interactions. This makes advanced analytics accessible across the entire organization. Additionally, Selector integrates seamlessly with automation platforms and ITSM solutions, automating incident remediation workflows efficiently. Selector’s Strategic Advantage Selector continuously innovates, delivering a scalable, intuitive platform that enhances resilience, minimizes downtime, and accelerates incident response. As organizations navigate increasingly complex IT environments, Selector positions teams to tackle today’s operational challenges effectively and proactively prepare for future demands. To learn more about how Selector aligns with Gartner’s strategic insights and can drive significant value for your organization, schedule a demo with one of our network experts. To stay up-to-date with the latest news and blog posts from Selector, follow us on LinkedIn or X and subscribe to our YouTube channel.
Network operations have become increasingly complex due to the distributed nature of modern applications which use data from private data centers, public clouds and the internet to provide end user services. With the adoption of these multi-cloud, multi-tier application architectures, network engineers must integrate new services (e.g AWS Direct Connect and Kubernetes clusters) from cloud providers into their existing services. To make operations more challenging, operational and data access methods of different network services are quite different. For example, cloud providers only provide an API (e.g Cloudwatch) to get operational metrics, while traditional networking equipment provides SNMP, streaming telemetry and Netconf interfaces. As a result, introducing new network services and MOPs is expensive as they can disrupt networks carrying revenue generating traffic. Introducing Selector’s Network Digital Twin (NDT), a digital representation of your network created from real time data. This service enables efficient decision-making and experimentation without interfering with the real network, reducing risks to your infrastructure. The NDT gathers all network metrics, logs and events in real-time and draws correlations using ML. Unlike legacy solutions which use “simulation software”, the NDT relies on real time operational metrics and logs. It visualizes and analyzes the present network state, learns from the past, and fast-forwards to predict future scenarios. Real Time view of the network: NDT offers a real-time view of the network and services topologies. It shows the quality of different paths, including end-to-end latency, any packet drops along the path, and queue build-ups. This information allows for faster troubleshooting and increased resilience by pinpointing the exact moment an issue occurs and tracing its effects across the network. The paths are learnt by a variety of mechanisms such as real time BGP LS streams, RIBs via BMP, and LLDP polling via SNMP. Network DVR: NDT records all historical data depending on the retention policies configured on the platform. This ability is used to playback past events to diagnose root causes of the issue. Outages or chronic issues are caused by slow buildup of bad events; for example, slight topology changes cause increases in the number of hops in a path, thus increasing latencies. The NDT addresses issues by detecting the latency increases using ML, then DVRing in the past to show when the topology changed, and highlighting the configuration change that may have mistakenly increased the costing of a link, causing the path change. What-If analysis: Using the experimentation platform of the NDT, customers can stress-test network resiliency against link/optical faults, device failures, or impact of excess traffic from a potential DDoS attack. Users can identify the sites that are prone to blackouts by simulating link/node failures. For example simulation scenarios can be created where the number of prefixes advertised by a route reflector are increased. Then one can observe whether any circuit breakers that limit prefix advertisements get tripped. This feature allows network operators to determine potential choke points in the network and address them proactively. Growth Planning and Cost Optimization: NDT allows customers to identify and resolve network bottlenecks and repurpose underutilized resources. Customers can model different growth scenarios as new users or new services come aboard the network. This type of scenario planning helps customers to manage their CapEx and OpEx budgets and run a profitable network. By using ML based predictive algorithms, the NDT can proactively inform operators that the number of 10G ports in a given network location should be increased as there is likely to be an increase in user demand in that location. Summary: Selector’s network digital twin transforms how you operate your network. The NDT is the DVR of networking: by looking into the past, it helps you make decisions for the future. The experimentation platform allows you to test new services and network configurations without putting your existing networks at risk. Ready to see your network in a new light? Contact Selector today to learn more about our digital twin solution.
Understanding the state of your network and infrastructure is a critical responsibility for operations teams. Without their ever-watchful eye, network issues can cause problems ranging from annoying performance issues to downtime. To detect, prevent, and address these issues, operations teams have relied on a combination of monitoring and manual correlation, leveraging whatever tools were available. This approach has proven to be time-consuming for staff, prone to human error, and can lead to cascading issues when problems are misdiagnosed. Moreover, the complexity and volume of data generated by the environment continue to grow, straining resource-strapped Ops teams. Fortunately, a combination of observability and AIOps introduced by Selector offers a new approach to address these challenges. Selector auto-discovers devices, metadata, and relative topology—and stores the data within a hosted Configuration Management Database (CMDB). The collected metadata informs per device profiles driving telemetry collection. The telemetry is then continuously analyzed by AI/ML, correlating anomalies and identifying root causes. The overall solution accelerates troubleshooting, enabling operators to focus on remediation. Discovering Devices and Collecting Topology A key advantage of Selector’s solution is its ability to automatically collect and leverage the inventory and topology of the network. Selector offers an agent to discover and map your network infrastructure. The process works similarly to how a network engineer might accomplish the task: using credentials (kept securely, on-site) to incrementally log into the various network devices in your environment and collect the related topology data. Once connected to a device, Selector fingerprints the vendor, model, and version. Network topology data such as ARP, STP, CDP, LLDP, routing protocol sessions, and other data are then used to decide where to go next to repeat the process. Syncing Network Topology Topology data and related metadata are synced internally to Selector’s internal CMDB. CMDBs were historically little more than a basic file containing details of an organization’s IT environment—names of network entities, IP addresses, and credentials. However, in today’s complex IT environments, CMDBs have become critical tools for IT administrators. Warehousing the complete inventory, topology, and metadata for all discovered devices, Selector’s CMDB subsequently serves as the single source of truth for everything related to your environment. Customers may use Selector’s CMDB to: However, Selector’s CMDB has another crucial role: enabling Selector’s AI/ML to connect the dots across your devices. Telemetry Collection, Processing, and Outcomes Using a lightweight agent, Selector collects SNMP and streaming telemetry from each discovered device, exercises network connectivity with synthetic traffic and receives syslogs. The agent then securely transmits the associated telemetry to an instance of Selector hosted in the cloud or on your premises. Selector continuously analyzes this telemetry, applying AI/ML techniques to baseline the received telemetry and identify any anomalies. Next, contextual, temporal, and topological correlation determines the relationship between those anomalies. Associative models then identify the probable root cause of emerging incidents. The solution is further differentiated through Selector’s AI/ML-driven Smart Alerting. Smart alerting works through the baselining of telemetry—understanding what is typical for your environment—rather than relying on static, manual thresholds like legacy solutions. This helps ensure that alerting is meaningful rather than an arbitrary preference. Selector’s correlations automatically consolidate and deduplicate the telemetry data collected by the system, ensuring the team receives discrete, actionable alerts. Selector’s correlations can also be represented as a JSON object, through which the team can drive various operations activities—for example, ticket creation/maintenance and even closed-loop automation. A genAI-based conversational interface further enables operators to explore telemetry using natural language, and dynamic dashboards automatically provide relevant views into telemetry of interest. Conclusion Selector’s modern network observability offers a robust solution for operations teams to efficiently detect, prevent, and address network issues. With its automated device discovery, telemetry collection, and smart alerting capabilities, Selector’s solution automates much of the toil involved in setting up an effective observability platform. Further, AIOps moves teams towards proactive management of their networks, helping to reduce downtime and improve overall performance.
Selector returned to Networking Field Day this year to present our latest developments in network AIOps. Cofounder and CTO Nitin Kumar, along with VP Solutions Engineering Debashis Mohanty and Principal Solution Architect John Heintz, explored how Selector’s GenAI-driven conversational interface promises to not only address today’s network operations challenges, but transform the industry. Read on to catch the highlights of the live-streamed presentation which occurred on July 11, 2024. Today’s Network Operations Challenges The scale and complexity of modern networks has made it increasingly difficult to make sense of the available data and support incident remediation workflows. For instance, if cash registers at a retail establishment cannot connect to the payment application, the issue could reside with a number of different elements within the stack, including the wifi, local area network (LAN), software-defined wide area network (SD-WAN), Internet, multiprotocol label switching (MPLS), cloud services (e.g., AWS, Azure), or even the payment application itself (Figure 1). Experts or vendors for each of these domains may have access to the associated data, but the average user cannot collect all the data nor holistically interpret it. The ability to access the data and associated insights from all of these domains is known as data democratization. Selector supports data democratization through our GenAI-powered conversational interface as well as our consolidated smart alerting—both available via the customers’ preferred communication platform. Bridging The Chasm Obstacles currently exist which prevent the application of open-source AI services (e.g., ChatGPT, Microsoft Copilot, Google Gemini) to data originating from network infrastructure. To start, there’s the logistical issue of transporting the on-premises network operations data to these predominantly cloud-based services. The raw data is not only spread across thousands of devices, but often originates from different domains, vendors, and protocols. Further, the data exists in multiple formats, such as metrics, logs, events, and topology, and the accompanying metadata may be inconsistent and malformed. Additionally, open-source AI services are trained in English—not the nuanced semantics of network operations data. And lastly, these AI services mine your data and pass it along to other organizations, presenting a significant security and privacy risk. Selector solves these issues, and effectively bridges the chasm, through our innovative approach to data collection, processing, and analytics, in conjunction with application of precisely trained LLMs (large language models). Mapping English Phrases to Queries Selector’s processing of English phrases and subsequent application of GenAI relies on two core features. The first one involves addition of a uniform query interface to the data storage layer. For this interface, Selector chose SQL (structured query language). The second feature is LLM-translation of the English phrase to a SQL query so that it can access the query interface. The diagram below (Figure 1) depicts how Selector leverages our LLM to translate an English phrase into a query, as seen in the translation layer. The returned query then runs on the query layer, which exists above the data storage layer. Inside the Selector LLM The Selector LLM first determines which table or tables to query. Then, it focuses on keywords (e.g., Ashburn, GigE100*, last 2 days), applying these as filters to the table. The LLM manages these tasks through a process known as imputing, or inferring (Figure 3). A typical function takes input and computes output using a set of parameters. An LLM takes output and using a model, imputes a function and its parameters—a manner of reverse mapping. At its core, the Selector LLM model relies on a base, open-source model from the public domain. From there, it is trained with English data, enabling it to understand English language phrases. Next, Selector fine-tunes the LLM model with a dataset that teaches it to convert from English to Selector’s SQL query interface. These two steps do not involve the customer and occur in Selector’s cloud. We publish and deploy updates to this model periodically. Then, for each customer deployment, Selector fine-tunes the LLM model with client-specific data and entities—a process which occurs on-premises at the customer. In this way, models are customized to each client’s unique solution. For example, Customer A’s model is different from Customer B’s and Customer C’s (Figure 4). The models are atomic and can be on-premises or in the cloud, depending on the client’s preference. Selector Copilot Selector Copilot, our GenAI-driven conversational interface, enables network operators to ask questions in plain English. For example, an operator might prompt Selector Copilot to “Show me errors in Ashburn” or pose the question, “Are there any port errors in Ashburn on all GigE100* in the last 2 days?” For each query, Selector Copilot returns a visualization and summary of the results. Users can then interact with Selector to explore these results. They can also copy and paste each visualization from Selector Copilot onto a dedicated dashboard. As an example, let’s say a cloud service provider asks Selector Copilot about the latency of their transit routers over the last two days. Selector Copilot delivers a visualization and summary, revealing higher latency within a certain region. The user can then drill down into network topology information to further investigate the higher latency, accessing relevant details about devices, interfaces, sites, and circuits. Selector Alerting Selector’s consolidated alerts reveal the nature and severity of an incident along with the probable root cause and a comprehensive list of affected devices. Selector’s alerting process relies on a pipeline in which machine learning and statistical algorithms first determine violations or anomalies across all network, application, and infrastructure data. These events are then clustered into correlation trees or graphs which indicate a single incident. Selector summarizes these underlying event correlation graphs into consolidated alerts delivered via the customers’ preferred collaboration platform such as Slack or Microsoft Teams. Alerts can also be configured to automatically create tickets and map all relevant information into them. This enables downstream teams to commence remediation activities. Watch the Selector presentation at Networking Field Day 35
Quality medical care today relies on “health systems” built from geographically distributed healthcare settings such as hospitals, urgent care clinics, imaging centers, nursing homes, pharmacies, and specialist offices, among many others. Each setting shares data within the broader health system through Electronic Medical Records (EMRs). EMR systems, which were purpose-built to manage patient records, help improve patient outcomes through the real-time sharing of patient data. Several EMR systems are available today, with Epic being the most prominent. Over time, EMRs have evolved into a broader healthcare operating system, helping administrators manage health records, nurse work assignments, nurse reviews, employee assessments, payroll, budgeting, recruiting, patient portals, patient records, and more. Because EMRs are so deeply ingrained into the hospital’s business, even the slightest performance issue materially affects the entire organization. Unfortunately, performance issues occur frequently, and the impact on the organization is often severe. This is because the EMR system is part of a complex healthcare IT ecosystem, where an issue in one place, such as the network, can wreak havoc across all the systems within the environment. Given these risks, organizations must ensure the proper health of the IT environment across their stack—from network, compute, and storage infrastructure to cloud and EMR applications. Historically, administrators relied on a suite of products to help address these challenges. IT teams would manually inspect the data and connect the dots to resolve problematic issues. As one might imagine, this process is time-consuming and requires significant effort on behalf of the team. Fortunately, Selector’s purpose-built solution for healthcare observability delivers comprehensive full-stack observability and powerful AI/ML-driven capabilities, such as root cause analysis and event correlation, that continuously analyze the environment, surfacing insights in real-time. The Challenges of Monitoring Healthcare IT The historical challenges of performance monitoring and troubleshooting EMRs are mainly due to the complexity of the ecosystem they are within. This ecosystem includes network, compute/SAN, virtualization/VSAN, and applications and is often composed of hundreds to thousands of servers, switches, routers, firewalls, access points, tablets, mobile desktops, and more. Complex Ecosystem Network All services sit on top of the network, which can often involve thousands of different types of devices. Because network performance issues inevitably affect the services sitting on top of it, understanding network health is essential. However, this is challenging due to the complexity of modern networks and challenges such as Wi-Fi congestion, routing and switching issues, firewall misconfigurations, and transient capacity constraints, to name a few. These conditions all translate to difficult-to-troubleshoot problems across the IT environment. Compute/SAN On top of the network is the physical infrastructure, including CPU, memory, and disk, all connected via the network. Also, at this layer are the various storage solutions employed by the health system to support the long-term storage of medical records and patient imaging. Virtualization The VMware ecosystem is built on top of compute and storage infrastructure and consists of the ESXi, vSphere, and VSAN services. Monitoring VMware’s health is essential because if the VMWare hosts are not working correctly, the guest virtual machines are unlikely to operate as expected. Applications Many different applications support the hospital and are typically hosted within VMware. These include the various services that comprise Epic, including BCA (prescriptions), EPS (printing services), and BLOB, the binary object store that manages the different aspects of collecting and retrieving the data. Limitations of Current Solutions Given the complexity of healthcare IT ecosystems, issues can easily go unnoticed. Consider this scenario: Hard drives degrade within the SAN servicing the block store service—Epic’s BLOB service, a critical component of the Epic ecosystem. Aggregate seek times, throughput, and general performance of the SAN begin to decline. Over time, users start to complain. Everything works, but things seem slower. Nurses, doctors, and administrators are all experiencing the issue. Epic’s integrated monitoring—SystemPulse—shows the application is working as expected, but the reports continue. At this point, the team will start guessing what the problem is. Meanwhile, the performance degradation has begun to materially affect the organization. Quickly resolving issues like these requires an observability solution that provides comprehensive visibility across the entire ecosystem, from the network through the applications. This solution also needs to collect and correlate telemetry across every single one of these domains. Further, due to the sensitivity of healthcare data, a solution must be able to securely collect data from on-premises environments. Unfortunately, many existing solutions do not meet all these needs. Products like SolarWinds, for instance, do not have all the data (they have no way of collecting EPIC data, for example) and cannot bridge the data between different domains, so capabilities such as automated correlation and root cause analysis are not possible. Without a solution to address the challenges of this environment, identifying and resolving problems continues to be time-consuming, creating headaches for all healthcare staff trying to leverage essential systems like Epic on a routine basis. Selector: Observability and AIOps for the Epic EMR Environment To help hospital operators proactively optimize performance and identify issues before they cause problems, Selector provides a purpose-built technology for end-to-end observability of the healthcare IT environment, which can be deployed on-premises or in the cloud. Comprehensive Visibility Enables True Full-Stack Observability Selector provides comprehensive observability by collecting and analyzing real-time performance telemetry from the network to applications and everything in between. It collects the metrics, logs, and events as well as all the metadata from every layer within the healthcare IT ecosystem—network, compute, storage, cloud, virtualization, applications—and warehouses the data in the Selector data lake. As a result, Selector enables true full-stack observability, accounting for the network layer when most other vendors have ignored this critical component. Auto-baselining Immediately Surfaces Abnormalities Selector applies machine learning to all metrics and logs from its ingest layer, surfacing anomalies. It baselines telemetry in real-time and immediately identifies when a system’s performance deviates from its normal behavior, including accounting for cyclicity—what’s expected during certain times of day—and seasonality—what’s typical for certain times of the year. These baselines are then used to power dynamic alerting rules, sparing IT
For the second consecutive year, Selector has been recognized as a Best Place to Work in the Bay Area. In 2023, the company was ranked among the top 10 companies with 25–49 employees. This year, the company ranked again in the top 10. This regional award is sponsored by the San Francisco Business Times and Silicon Valley Business Journal in collaboration with Quantum Workplace—an employee engagement data firm. The winning Bay Area companies for 2024 were announced in March, with the ceremony and ranking on June 13. Learn what this honor means for Selector as we continue to advance the world’s first unified observability and AIOps platform. Measuring Employee Engagement and Satisfaction The Best Places to Work in the Bay Area award recognizes leading employers in the region. These companies stand out for enabling employee engagement and workplace satisfaction, cultivating an environment where employees are happily committed to their work, advocate for their company, and plan to stay with the company. To be nominated, companies must have an office in any of the twelve area cities and a minimum of 25 employees. These employees must work permanently at the company’s Bay Area office or be US-based remote employees who report to that office. After a company is nominated, Quantum Workplace conducts a survey to determine which companies employees thrive, engage in, enjoy meaningful work, and feel their voices are heard. For Selector, this survey went to employees who work at or report to our Santa Clara, CA headquarters. The survey allows employees to share their experience working for the nominated companies. It measures various research-validated workplace factors that impact employee engagement and satisfaction. For example, it inquires about compensation, benefits, employee engagement, and trust in senior leadership. Each question on the survey has a corresponding response option, each assigned a numerical value. Quantum Workplace uses these values to profile employees, calculate an overall score, and determine each organization’s rank. The award winners are companies that achieve the highest ratings for being a fun workplace with a collaborative culture, solid compensation and benefits, fair management practices, and other amenities. A Reflection of Selector’s Commitment to Its Employees Upon receiving the 2024 Best Places to Work in the Bay Area title, Selector Co-Founder and CEO Kannan Kothandaraman said, “This award showcases the quality and character of our employees, their dedication to each other and our company, and their steadfast commitment to delivering unparalleled solutions and customer outcomes.” Eric Moore, Selector’s Vice President of Worldwide Sales, added, “When employees recognize a company’s appreciation for them, they are motivated to perform above and beyond, to really take pride in their work. This not only drives product innovation but translates to an overall enhanced customer experience.” Survey feedback included the following employee comment, “I’ve held quite a few jobs over the years, and none have felt as engaging and exciting as my role at Selector. It’s a really great balance between startup culture, where no two days are quite the same, and a mature company with stability and support.” Another employee commented, “I appreciate the innovative culture at our company [and] our commitment to excellence. I am fortunate to work alongside talented and dedicated colleagues, who inspire me to strive [to do my best]. I appreciate the supportive atmosphere where everyone is willing to share knowledge and help each other grow.” A Respectful, Supportive, and Collaborative Workplace Selector strives to foster an environment that inspires its employees’ curiosity and creativity, and empowers them to share ideas, enabling a respectful and collaborative workplace. The company offers: Selector employees can also choose their preferred work environment—remote, hybrid, or on-site at its headquarters in Santa Clara—to best suit their lifestyle and productivity. Additionally, they can shift from one environment to another throughout the year as their lives and schedules require. Another Achievement in 2024 for Selector Selector’s flexible, collaborative environment enables its employees to do their best work. In just the first half of this year, Selector achieved the following successes: Winning this award is a further testament to the team’s incredible drive and dedication—fostered by one of the best places to work within the Bay Area.
Network performance plays a key role in service delivery, acutely impacting user experience. However, enterprise teams have long struggled with comprehensive insight into network performance and, when necessary, the ability to hold circuit vendors accountable. Fortunately, Selector’s advances in network monitoring and observability support detailed visibility into circuit performance, making it possible to establish service level agreements (SLAs), service level objectives (SLOs), and service level indicators (SLIs) for circuits. These correspond to the promises organizations make to their customers, the internal objectives that help organizations keep those promises, and the direct measurements organizations use to assess their performance. In fact, Selector can collect and analyze telemetry from your full-stack, delivering insight into SLAs, SLOs, and SLIs for network, but also infrastructure, cloud, and application as well. Network Reliability as a Discipline SLAs, SLOs, and SLIs are used extensively within the discipline of site reliability engineering. However, the network domain has been historically underserved. Innovations in tooling, philosophies, and practices oriented around reliability and performance are largely applied elsewhere. Fortunately, we can learn from these innovations to improve delivery of the network. For instance, in the infrastructure and application spaces, we have the concept of a site reliability engineer (SRE). This role has become table stakes for any enterprise looking to competently deliver a service. In contrast, network reliability engineers (NREs) remain relatively uncommon. NRE responsibilities are similar to those of SREs, but specifically adapted to measuring and stabilizing the reliability of the network to align with enterprise goals. A likely reason for the obscurity of NREs as well as the absence of impactful practices such as circuit SLAs, SLOs, and SLIs, has been insufficient data and tooling. However, the tide has turned. Selector’s unified monitoring, observability, and AIOps platform is uniquely positioned to provide teams with essential insight into network health and performance. Measuring Circuit Performance with Selector To achieve a comprehensive view of network performance, circuit performance must be taken into account. Selector helps operators assess circuit performance by providing unprecedented visibility into circuit KPIs such as latency, jitter and error rate. Clients benefit from an integrated network monitoring solution, accessible through a unified dashboard, that replaces key functionality historically addressed by multiple tools. Most crucially, Selector’s platform assists operators with defining and continuously monitoring their SLAs, SLOs, and SLIs. To measure SLA compliance, for instance, Selector first defines the SLA with respect to circuit KPIs such as throughput, latency, jitter, errors, uptime, flaps, etc. These KPIs are then combined to calculate an overall SLA score. Selector will even adjust the score to take into account any maintenance windows published by circuit providers. For instance, if a provider announces they will be down for six hours due to system maintenance, that downtime will be excluded from the SLA calculation. In the image below, a client is using a Selector dashboard to track circuit performance from several different network providers, revealing compliance across multiple kinds of links that are available in a particular location. At the top of the dashboard, each SLA card displays the aggregate performance across all the circuits offered by a given vendor. As you can see, all 100+ connections provided by Vendor-2 are summarized within the first card, on the upper left. According to this snapshot, the client can see that Vendor-2 is only meeting their contractual commitments 78.51% of the time, which depending on the agreement, may be grounds to receive credits back or trigger termination of that contract. Clients can also select a time period, such as a calendar month, over which to view SLA or SLO compliance as articulated within the commitment or agreement. The table below the cards lets customers investigate specific circuits of interest. For example, users can sort by circuit downtime or circuit availability—our tool provides a ranked list of metrics from best to worst. Clicking on the circuit ID enables users to drill down into a detailed view of the various KPIs related to a given circuit (jitter, latency, throughput etc.), allowing operators to pinpoint specific issues and determine why a given circuit might be failing its SLA. A Trusted Third-Party Assessment of Circuit Performance Selector delivers key circuit performance data directly to clients, so they no longer need to rely on reporting from vendors. In essence, Selector provides a neutral, third-party assessment of circuit performance and circuit SLA compliance. Once compliance insights are generated, Selector can consolidate them into a report and send it to internal and external stakeholders.These reports, which can be generated daily, weekly, monthly, or quarterly, facilitate collaboration and decision-making within an organization. For instance, management or procurement might be interested in learning more about which circuit vendors are performing best, so they can confidently renew those relationships. Alternatively, these teams may want to know which circuit vendors are underperforming, so they can replace them or negotiate a discount. Choose Success The scale and complexity of network infrastructure continues to grow, as well as the demands placed on it by service providers and their end-users. Operators must rise to the challenge, building and maintaining resilient network solutions that prioritize reliability, availability, and performance. Now, more than ever, these teams must select the appropriate tooling to support their efforts. With robust monitoring capabilities and an impressive suite of AI/ML-powered features, Selector helps operators not only meet today’s rigorous scalability demands, but prepare for those of tomorrow. What’s more, Selector can apply these strategies from network to application, and everything in between.
Selector has received the 2024 Data Breakthrough Award for Data Observability Innovation of the Year in the Data Management category! This accomplishment marks the second year Selector has received a Data Breakthrough Award, winning Best AIOps Platform in 2023. In this post, learn more about this award and how it reflects our unique approach to data observability. Behind the Data Breakthrough Awards The Data Breakthrough Awards span various categories, including data management and analytics, compute and infrastructure, and industry applications and leadership. Winners for the Data Breakthrough Awards are determined by conducting a thorough review, scoring, and analysis of top companies, startups, and well-winning organizations. Among this year’s winners are Pure Storage, Dremio, Alteryx, Sumo Logic, Western Digital, and Red Hat. A representative for Data Breakthrough said, “Our goal is to deliver the most comprehensive analysis of the data technology industry each year. And with over 2,250 nominations coming in from all over the globe for our 2024 program, the industry evaluation was broad and extremely competitive this year!” A Major Achievement for Selector In recognizing Selector, the Data Breakthrough Awards placed our platform among the best data companies, products, and services around the world that have “broken through” the crowded data technology market in 2024. Selector’s industry-leading technology simplifies today’s complex and sophisticated IT landscape by merging monitoring, observability, and AIOps into a single platform. At its foundation, Selector leverages advanced artificial intelligence (AI) and machine learning (ML) techniques to drive transformative features, including anomaly detection, event correlation, root cause analysis, and smart alerting. The Selector platform provides teams with a single pane of glass and key functionality historically addressed by multiple tools, enabling them to alleviate tool sprawl, enhance operational efficiency, and zero in on improving the customer experience. This recognition is among the many Selector continues to receive for our achievements in observability and AIOps.