AI for Network Leaders — Powered by Selector

Virtual sessions available on-demand now!

AI for Network Leaders — Powered by Selector

Virtual sessions available on-demand now!

Selector AI blog

Discover how AI, automation, and observability are transforming network operations. The Selector AI Blog shares expert perspectives, technical deep dives, and real-world insights for IT, engineering, and operations leaders.

All Articles

Flexibility Without Friction: Custom Synthetics for Modern Monitoring

In today’s distributed systems, visibility isn’t optional — it’s critical. If a DNS resolution slows down, an API responds incorrectly, or a regional network segment drops packets, customers need to know immediately. Our Synthetics feature was built for that purpose: it runs a suite of predefined network probes (like ping and traceroute) and pushes the resulting metrics — latency, jitter, packet loss — into a metric store for monitoring and alerting. That’s worked well for standard use cases. But over time, one thing became clear: customers needed more flexibility. Suppose a customer wanted to check DNS latency to an internal domain, validate TLS handshakes, or simulate a multi-step API transaction. These types of checks didn’t fit into the fixed list of built-in probes. Supporting them meant updating the underlying agent — a process that’s slow, inflexible, and often out of sync with customer needs. So we asked: What if customers could write and run their own probes — without needing to wait on us? The What: Introducing Custom Synthetics Custom Synthetics is a new capability that lets customers define their own network and service checks in Python, upload them to the platform, and have them run just like our built-in probes. The resulting metrics flow through the same telemetry pipeline and end up in the customer’s metric store, ready to be charted, queried, and alerted on. This gives customers full control over what to measure, how to measure it, and what constitutes success or failure — all without needing to deploy a new agent version or submit a feature request. Here’s how it works in practice: This design means customers get the best of both worlds: the flexibility to define their own probes, and the reliability of a first-class monitoring pipeline. How It Works: Behind the Scenes The technical foundation of Custom Synthetics focuses on security, flexibility, and observability. Here’s a breakdown of the core architecture: This system lets customers move fast, experiment freely, and maintain visibility across increasingly complex networks and services — all without sacrificing control or safety. Sample Custom Synthetic Probe: This probe takes in a list of URLs, and generates the LoadTime metric for each of those URLs. Once this probe is uploaded and attached to a compute resource along with the associated configuration parameters like the URL list, it will be executed at a certain interval and the metrics generated would be shipped to a metric store. One could then visualize these metrics on dashboards, or configure alerts and other workflows. Following is one such visualization where the above probe was configured with URLs www.amazon.com, cnn.com, github.com, google.com, and meta.com. Here we have load time for each of the URLs as a time-series. Use Cases: Monitoring Without Limits Here are just a few examples on how custom synthetics could be leveraged: DNS Resolution Monitoring Many customers manage private DNS zones or depend on third-party resolvers. A custom probe can resolve critical domains from different locations and emit latency and success/failure metrics, helping detect regional DNS degradation or failures. Authenticated HTTP Checks Built-in probes can’t always deal with real-world scenarios like token-based authentication or POST requests with payloads. With a custom probe, customers can simulate full API requests, validate response bodies, and verify SLA compliance with business-critical services. TLS Handshake Timing For services requiring secure connections, customers can create probes that measure the time it takes to complete a TLS handshake — a useful indicator of certificate issues or misconfigured CDNs. Multi-Step Logic Checks One could build a probe that performs a series of dependent API calls to simulate a full user journey — checking that the right objects are returned and that the data remains consistent across services.  These kinds of probes are incredibly valuable — and until now, they required workarounds or custom tooling. With Custom Synthetics, they’re native and first-class.  Why We Built It This Way There were easier paths: we could have added more built-in probes, or created a limited DSL for customers to configure predefined actions. But we believe in building tools that scale with customer creativity, not limit it. That’s why we chose Python — it’s accessible, expressive, and familiar to most engineers. Combined with strict sandboxing and a simplified output model, it lets customers go from idea to live metrics in minutes, without compromising system safety or observability. From an engineering perspective, this meant investing in: Our goal wasn’t just to make probes extensible — it was to make extensibility feel native. Observability on Your Terms Custom Synthetics unlocks a new level of control in monitoring setups. No more waiting for platform updates. No more hacking together one-off tools. Now, when customers need to observe something specific — whether it’s a DNS resolver, a slow API, or a flaky external dependency — they can write a probe and ship it themselves. All the heavy lifting is handled by the platform. Customers write the logic, we run it securely, and the metrics show up exactly where they should — ready to power alerts, dashboards, and decisions. Whether you’re a platform team enforcing SLAs, a network engineer debugging regional anomalies, or an application developer catching regressions before your users do, Custom Synthetics gives customers the flexibility they need — without the friction. Check out the docs, explore real-world examples, and start building your first custom probe today. To stay up-to-date with the latest news and blog posts from Selector, follow us on LinkedIn or X and subscribe to our YouTube channel.

Peaks and Pitfalls of LLM Usage in Production Systems

In the last eighteen months, Selector has progressively integrated large language models into its operations to a greater and greater degree. We use them in both extractive and abstractive ways, seeking to both interpret natural language effectively and communicate in a way that the most seasoned network engineers might do in the real world. The evolution of the LLM space over this time period, from the models themselves to the tools and frameworks used to work with them, has progressed at a phenomenal pace. Tools that we used initially have already been deprecated or abandoned, new and better ones have sprung up, and we’ve been forced into rapid adoption and rapid migration scenarios along the way. It’s been quite a journey, and we’ve learned a tremendous amount. Thinking back over the process of building Selector Copilot, our chat style question and answering system that allows you to “talk” to your network, we’ve picked up many valuable lessons, learning both from the pitfalls we managed to dodge and the ones we stumbled into. Copilot is the last step of a vast pipeline of data enrichment and interpretation, stacking layers of increasing insight using traditional machine learning techniques and transformation steps. Each step builds upon the last, bubbling its way up toward the final output – a concise and accurate natural language summary of the state of your network and devices at any given moment. Getting the data to the point where it can be handed off to an LLM requires careful consideration, with lots of opportunities for errors along the way. These are just a few of the most important lessons we’ve learned while integrating an LLM into the Selector app. 1. Overestimating the LLM’s inherent understanding beyond language Fundamentally, a large language model is good at language. It does not know your data, the particulars of your naming conventions, the topologies in which your devices are organized, or the interconnections between them. It cannot accurately determine root causes, correlate events, or explain why something might be happening in a non-generic way. Given the opportunity, it can and will confidently make up answers that sound reasonable, but are fundamentally inaccurate. This highlights the importance of data processing using traditional machine learning and transformation methods, so that the final context presented to the LLM already contains the answers being sought. A few examples… Even beyond that, our data passes through several final stages of transformation specific to natural language presentation. This further refines the data in a way specific to how the LLM needs and understands context data. In fact, we go so far as to generate a pseudo-natural language summary using traditional data-processing techniques that serves as a hint to the LLM as to what to report. Needless to say, what we’ve learned is to leave nothing up for interpretation when it comes to LLM consumption. Especially in mission critical environments, it is imperative to get this part right. 2. Context is King I remember in my elementary and high school math classes, I’d be given word problems that would include some related, but ultimately irrelevant, facts. These would serve to see if I could filter out the signal from the noise, and not get thrown off by some data points that weren’t part of the final calculations. Inevitably though, I’d get snagged by one of these superfluous nuggets and end up with the wrong answer. If it can happen to a human, it can happen to an LLM (now that I say that, I may have to give that some additional thought). Regardless, too much and too dissimilar information can lead to some very unexpected responses. For example, the Selector app is constantly scanning all known datapoints and events, correlating them into records that represent single anomalies. These records are labeled by device, interface, subnet, and so on, allowing us to quickly check a specific entity for issues. Initially, we attempted to scan across the entire collection looking for all anomalies related to an entity, indiscriminate of the anomaly or event type. However, this proved problematic as the LLM would incorrectly connect the dots between these often unrelated anomalies in strange and interesting ways, presenting the user with a confusing and convoluted summary. We’ve since implemented more robust filtering to ensure only genuinely related anomalies are grouped for summarization. Of course, the best-case scenario would be striking that perfect balance of providing just enough information needed to answer a question, and nothing more. We have employed a variety of techniques toward this end, including: On the flip side of that, leaving out relevant information can starve the LLM, giving it no choice but to hallucinate and make up an answer. Our prompts always include instructions to report that it can’t determine an answer if relevant data hasn’t been provided, but invariably we see it happen anyway. For example, in the early days when token limits were stricter and context windows were smaller, we would indiscriminately strip all numeric values from a dataframe before passing it along to the LLM. When questions such as “What is the latency between device A and device B” were asked, the LLM wouldn’t have the information needed to report the average latency in the given time window. It did have information about whether the latency was within normal limits or was in violation, but the actual value was not present.  Today, we use a rag mechanism that finds the top n columns related to the user’s query and provides only those column values to the LLM. This has allowed us to be more precise in what is given to the LLM, without overwhelming it with superfluous information. These incremental learnings and adaptations we’ve made along the way have made a big difference in both the quality and accuracy of responses. 3. Poor or Vague Prompting I won’t spend too much time on this one, as it’s pretty obvious. The prompts need a high level of detail. Leave no boundary undefined, provide examples, be specific,

From Detection to Resolution: How Selector + Itential Deliver AI-Driven Observability and Automated Recovery

Every second counts when it comes to detecting, diagnosing, and resolving network incidents, yet many teams still find themselves stuck in reactive mode, drowning in alerts, manually writing scripts, and managing tickets across disconnected systems. This is where Selector and Itential come in.  Together, Selector and Itential deliver a powerful, enterprise-ready solution that closes the loop between detection and action. Real-time observability, paired with policy-driven automation, enables the instant resolution of incidents with no manual intervention.  For a closer look at the integration, view our webinar, “From Insight to Impact: Closing the Loop on Network Ops with Selector and Itential” featuring Selector’s Head of DevRel, John Capobianco, and Itential’s Director of Tech Evangelism, William Collins.  AI-Driven Observability Meets Automated Remediation The Selector + Itential integration creates a closed-loop, event-driven workflow designed for modern infrastructure teams. Here’s how it works:  This is observability and automation working in sync – from detection, to decision, to deployment.  A Real-World Example: Automated Port Reset Imagine a scenario where a critical port goes down:  All of this happens without a single manual script, ticket update, or context switch – dramatically reducing mean time to resolution (MTTR).  Why Enterprises Trust Selector + Itential Built for Modern Infrastructure Whether you’re running large enterprise IT, telco networks, or hybrid cloud environments, Selector and Itential give you the tools to shift from reactive firefighting to proactive, automated operations. With this joint solution, you can detect issues in real-time, trigger automation instantly, validate and track every action, and resolve incidents before users are even aware of them.  Discover how Selector and Itential can help your organization move from detection to resolution in real time. View the webinar today.  To stay up-to-date with the latest news and blog posts from Selector, follow us on LinkedIn or X and subscribe to our YouTube channel.

Intelligent Incident Management with Selector AI

Imagine you are a NOC lead, dealing with a network outage caused by a linecard failure. You are looking at potentially hundreds of incidents, starting from link flaps, protocol flaps, packet losses, application latencies, and many more. Compare that with having just a single incident in human-readable format, which lists all these impacts, highlighting the root cause as the linecard failure and suggested remediation.  Selector AI’s Intelligent Incident Management transforms incident handling by focusing on creating fewer, smarter incidents with complete context, drastically reducing Mean Time to Resolution (MTTR). This blog explores the core principles of this intelligent incident management approach. Core Principles of Intelligent Incident Management Data-Driven Correlations Selector AI’s core strength resides in its data-driven correlation capabilities. The platform is designed to process a wide array of data, encompassing system logs, performance metrics, and events from existing monitoring tools. To manage this large influx of information effectively, Selector AI leverages machine learning algorithms. These algorithms play a crucial role in significantly decreasing the amount of data that human operators need to review manually, thereby improving efficiency and reducing cognitive load. One of the key methods employed for data reduction and anomaly detection involves techniques such as baselining and log mining. By establishing normal operational patterns (baselining) and inferring from log entries (log mining), the models can pinpoint deviations or interesting data points. This reduction of surface area enables a more focused approach to identifying potential problems that may be interconnected within overwhelming volumes of raw data. To identify these interconnecting anomalies, Selector AI utilizes collaborative filtering algorithms. These algorithms analyze data that has been enriched with additional metadata, allowing the system to create detailed representations or embeddings of the data points. These robust embeddings are instrumental in the engine’s ability to correlate anomalies that share some context.  These correlations are constructed as graphs that users can visualize to understand the model’s thought process for correlating. The graphs are then processed by the causation models, which infer the root cause from the events that constitute the graph. These graphs, coupled with root cause analysis, are used to generate consolidated incidents, providing users with a comprehensive understanding of problems rather than presenting a barrage of isolated incidents for each anomaly. A typical correlated incident by Selector packs anywhere between 10 to 500 individual alerts. LLM-Enhanced Incident Descriptions We should not expect that the correlated content produced by the algorithms can be understood by the NOC users as is. This is where Large Language Models (LLMs) are leveraged to generate human-readable and actionable incident summaries, significantly enhancing communication and understanding. These summaries provide clear, concise descriptions of the problem, its potential impact, and recommended actions, which simplifies troubleshooting and decision-making for incident responders.  The example below shows how an LLM-enhanced version of correlated incident is presented, Stateful Incidents Incidents are episodic in nature, meaning their effects can persist over a period of time until the root cause is resolved. Reducing the alerts alone is not sufficient. Instead, incidents should be treated as dynamic entities that gather context over time. Instead of viewing incidents as isolated events, Selector incident management tracks their evolution, accumulating relevant data and insights as it unfolds.  This provides a comprehensive picture of the incident’s lifecycle, facilitating more informed and effective resolution strategies, ultimately reducing the number of incidents to the number of unique episodes. Maintenance Window Awareness One practical aspect of incident management is the use of maintenance windows. Typically, incidents are created and then NOC teams close them as related to a planned change request, wasting a lot of man-hours. This is because maintenance windows are communicated out of band, via emails from the vendors. Selector’s incident management approach is maintenance window aware. It can automatically recognize entities such as devices, circuit IDs, and time windows in unstructured email texts. When the engine detects a correlated event, it is verified against any active maintenance windows. Suppressing incident creation for matching events helps prevent a lot of noise and ensures that incident responders only focus on genuine issues. Benefits of Intelligent Incident Management Benefit Description Reduced MTTR Faster identification and resolution of incidents due to enhanced context and intelligent insights. Fewer, Smarter Incidents Filtering out noise and focusing on genuine issues leads to more efficient incident handling. Enhanced Operational Efficiency Automating alert correlation and providing actionable summaries, saving time and resources for IT teams. Proactive Issue Detection Identifying potential problems before they escalate into major incidents through the use of machine learning and analytics. Improved Communication Providing clear, human-readable incident descriptions that facilitate collaboration and understanding. Conclusion Intelligent Incident Management, powered by Selector AI, represents a significant shift in how organizations handle incidents. By integrating machine learning, large language models (LLMs), and other advanced algorithms, Selector AI transforms reactive processes into proactive strategies, reduces operational overhead, and significantly improves service availability. Embracing this approach enables organizations to manage their IT environments more efficiently and effectively. To stay up-to-date with the latest news and blog posts from Selector, follow us on LinkedIn or X and subscribe to our YouTube channel.

The Brain Behind the Pings: Understanding the Synthetics Control Plane

In today’s interconnected world, a fundamental question plagues every network administrator and SRE: “Is my network running well?” The answer, often elusive, is precisely what synthetics aims to provide. By deploying a vast fleet of specialized probe agents, synthetics continuously monitors critical network health metrics, including latency, packet loss, jitter, and custom reachability checks, providing an unparalleled view into your network’s performance. While the core concept of sending pings might seem simple, the magic and the complexity lie within the control plane of a robust and scalable synthetics system. This isn’t just about sending ICMP packets; it’s about orchestrating a distributed system of thousands of probes to deliver accurate, real-time insights across a large-scale network. Designing the Control Plane: The Core Pillars A well-designed synthetics control plane is the backbone of its effectiveness. It’s what transforms raw ping data into actionable intelligence. Let’s explore the key mechanisms that enable this sophisticated orchestration. This section explores the control plane mechanisms necessary for managing a network of synthetics monitoring probes. Understanding these mechanisms offers valuable perspectives on how synthetics effectively handles distributed network monitoring: Managing Large-Scale Agent Deployments: Imagine deploying thousands of synthetics probes across various data centers, cloud regions, and remote offices. The control plane is your central hub for this monumental task. It facilitates automated deployment, upgrades, and health monitoring of these numerous probes, ensuring they are always running and reporting as expected. This involves sophisticated deployment strategies, version control, and continuous health checks to identify and address any agent-related issues proactively. To start with, the customers install a fleet of synthetics probes in their environment. Selector symthetics probes can be installed as Linux Debian/RPM packages or as Docker containers. The probes can be installed on Linux or Windows hosts, networking switches, and routers.  Customers often use Ansible or other fleet management tools to do the install. The artifacts to install are fetched from the Selector SaaS platform. All the agents auto-connect to the SaaS instance on startup to register. Connected probes are marked as registered but do not participate in synthetics yet until further action is taken.    Synthetics is driven by inventory configuration on the Selector SaaS platform. All probes that register with the SaaS platform should be added to the synthetics inventory. Inventory for probes can be added even before the probes are installed. Synthetics probes maintain a constant connection to the Selector SaaS platform, transmitting continuous health data to confirm their operational status. To minimize host system resource consumption, probes are designed to be lightweight. Additionally, they provide performance metrics for monitoring resource usage. Given their deployment within customer environments, probes proactively send critical logs to the SaaS platform. This ensures the availability of necessary data for debugging and enhanced visibility during triage.  Example of issues that can be alerted on based on the metrics: Implementing Pivot-Based Probes Grouping: Raw data is only as good as its organization. The concept of pivot-based grouping is a powerful mechanism within the control plane that enables efficient organization and analysis of network monitoring data. Instead of just a flat list of probes, the control plane allows for the dynamic grouping of probes based on various “pivots”- geographical location, network segment, application served, or even custom tags. This will enable you to slice and dice your monitoring data to gain insights specific to certain parts of your infrastructure, enabling targeted troubleshooting and performance analysis. For example, you could quickly view all probes monitoring your e-commerce platform or all probes within a specific metropolitan area.  Pivots are configured based on the inventory columns. Customers can select one or more specific columns to define as the pivot. This creates mesh instances of probes participating in synthetics within their instance. A probes can participate in multiple mesh instances. In the example below, 12 probes participate in multiple meshes based on pivot tags Tag1 and Tag2. The four meshes are: Handling Configuration Management for Probes: Consistency is key in a distributed system. The control plane plays a vital role in ensuring consistent and up-to-date configurations are applied to all monitoring probes. This includes managing which metrics to collect, the frequency of pings, target endpoints, and other related details.  Some of the example configurations that are supported and can be configured from Selector SaaS The configuration is synced from Selector SaaS to all the probes. Summary Fundamentally, the synthetics control plane serves as the essential operational mechanism, the sophisticated manager directing a complex system of network probes. It is the component that translates a basic concept into a robust, adaptable infrastructure for addressing the critical query: “Is network performance satisfactory?” By understanding these fundamental functionalities, one develops a more comprehensive understanding of the complex technical design that underpins efficient and thorough network observation. To stay up-to-date with the latest news and blog posts from Selector, follow us on LinkedIn or X and subscribe to our YouTube channel.

Selector is Headed to Cisco Live – Come Find Us in San Diego

The countdown is on! From June 8-12, Selector is heading to Cisco Live 2025 in San Diego, and we are bringing more than just buzzwords.  If you’re attending, make sure to swing by Booth #2619. Whether you’re deep in the weeds of operational chaos or exploring what’s next for AIOps, we’ve got something for you – live demos, expert advice, and a few surprises you won’t want to miss.  What’s Happening at the Booth Let’s start with the fun stuff:  We’re giving away three Mac Minis – one each day – plus custom graffiti-designed water bottles, exclusive Selector swag, and plenty of hands-on time with our platform.  But beyond the giveaways, what you’ll get is a fresh look at how modern network operations should work. Our team will be onsite, running live demos that show how Selector helps teams cut through alert noise, pinpoint root causes in seconds, and ultimately get ahead of performance issues.  Want the VIP Treatment? Skip the crowds and book a 1:1 meeting with our team. You’ll get a personalized walk-through of Selector’s platform, including:  Reserve your spot here:  Don’t Miss John Capobianco on Stage We’re also proud to announce that John Capobiano, our Head of Developer Relations, will be presenting a breakout session titled:  Artificial Intelligence-Powered Operations: Infrastructure Copilots and Root Cause Analysis Session ID: CNCOBS-2002 John will unpack the real-world impact of AIOps – how infrastructure copilots, automation, and multi-domain correlation are changing the way teams operate. It’s an excellent opportunity to see what Selector looks like in action, from root cause to resolution.  Check the Cisco Live mobile app for details and join the conversation in the Webex space after the session.  See You in San Diego! Whether you’re coming for the swag, the tech, or the strategy, we’re excited to meet you in San Diego. If AI-powered network operations are on your radar this year, make Selector at Booth #2619 your first stop.  See you at Cisco Live! To stay up-to-date with the latest news and blog posts from Selector, follow us on LinkedIn or X and subscribe to our YouTube channel.

Preparing for the Autonomous Future

Throughout this blog series, we’ve followed how AI reshapes network operations – from foundational data harmonization to real-time correlation, from contextual insights to agent-driven automation, and most recently, to conversational access through natural language interfaces.  But we haven’t reached the final destination. Everything we’ve covered so far—clean data, context, LLMs, AI agents, and accessibility—lays the groundwork for something bigger: a new era of predictive, autonomous network operations.  So what does the future look like? And how can enterprises prepare to embrace it? From Insights to Intelligent Action: The AI Maturity Curve Many organizations today are still in the early stages of AI adoption. They may have telemetry or anomaly detection visibility, but they still rely on humans to interpret data, triage alerts, and manually take action.  Selector’s platform is purpose-built to help teams move up the AI maturity curve from visibility to insight, to action, and ultimately to autonomy. Each layer builds on the last. Data is enriched, correlated, and translated into natural language. Insights evolve into intelligent decisions, and increasingly, those decisions can be acted on in real time.  This isn’t about replacing humans. It’s about building systems that can recognize patterns faster, respond earlier, and automate the routine so that teams can focus on the strategic.  Seeing Around Corners with Predictive Analytics The future of network operations isn’t just faster resolution – it’s early detection and proactive prevention.  Selector is already surfacing leading indicators of performance issues and risks before they escalate. By analyzing historical patterns alongside real-time telemetry, the platform can detect signs of instability: subtle shifts in traffic behavior, recurring error codes, signs of config drift, or trends in latency that often precede failures.  This predictive intelligence allows teams to act early, long before users feel the impact. In time, success will not be measured by how quickly you respond to incidents but by how rarely they happen at all.  The Shift Towards Self-Healing Networks One of the most exciting developments is the move toward autonomous remediation. Selector’s agent framework, which already supports automated actions and integration with ITSM workflows, is evolving to support fully self-healing capabilities.  Imagine this: A network process begins consuming abnormal memory. An agent notices the deviation, checks recent change logs, references historical fixes, confirms no maintenance conflicts, and restarts the device, without waiting for human intervention. It logs the event, notifies the team, and moves on.  These self-correcting behaviors won’t emerge overnight but are well within reach. And because Selector builds explainability into every step, these actions are always traceable, auditable, and reversible, ensuring trust and accountability even in automated environments.  Scaling AI in Complex, Hybrid Environments As enterprise networks grow more distributed, stretching across data centers, cloud platforms, edge locations, and remote branches, scaling AI becomes a challenge of its own.  Selector is designed to meet this complexity head-on. With support for containerized deployments, hybrid cloud environments, and integrations across virtually any telemetry source, the platform is architected to scale AI horizontally – across domains, vendors, and regions – without needing to rewrite your monitoring stack or swap out tools.  This architectural flexibility allows you to extend AI-powered operations across your entire footprint, not just within a single environment.  Preparing for the Autonomous Future Full autonomy doesn’t require a giant leap; it just needs the proper first steps. And that journey starts now.  The groundwork is already in place: Selector’s Data Hypervisor ensures your telemetry is clean and enriched. Its Knowledge Service brings context through real-time correlation. LLMs and RAG deliver accurate, accessible insights. AI agents turn those insights into intelligent action. And Copilot makes it all available to anyone through natural conversation.  For enterprises looking to move toward more intelligent, resilient operations, now is the time to start operationalizing that stack. Begin by improving your data hygiene. Identify high-volume, low-risk workflows that could benefit from automation. Introduce AI agents with human-in-the-loop guardrails. And empower more of your team to access insights directly through Copilot.  Each step you take gets you closer to a network that doesn’t just report problems but helps you solve them.  Final Thoughts: The Road Ahead Autonomous network operations may sound futuristic, but the path to get there is already paved. Selector is helping enterprises walk that path, layer by layer, capability by capability.  This isn’t about jumping to some far-off future. It’s about making your network smarter today—more observable, proactive, and responsive—with a clear, explainable AI foundation that can grow with you.  If you’re ready to move from dashboards and alerts to a future of self-optimizing infrastructure, Selector can help you take the next step.  Request a demo to see how Selector is laying the foundation of autonomous network operations. Also, make sure to follow us on LinkedIn or X to stay informed about the latest news and blog posts from Selector. 

The Control Plane Highway: Networking’s Hidden Infrastructure

When we discuss networks, we typically envision data packets racing along physical wires like vehicles on a highway. But beneath this visible traffic flows another critical pathway that few recognize: the control plane highway. This unseen infrastructure, where routing information flows between devices, makes the data highway possible. Before user data can flow, millions of paths must be established, creating a parallel network of equally vital importance. Understanding networks through this dual-highway perspective doesn’t just satisfy intellectual curiosity—it transforms how we approach network design, troubleshooting, and optimization. By reframing our perception to see the control plane as a highway, we unlock powerful new opportunities for network intelligence, where machine learning and AI generate unprecedented analytics, revealing patterns and predictive insights that have remained largely untapped despite the control plane’s decades-long existence. Reimagining Network Architecture Service providers build networks that connect customers to their services and branch offices using layer-3 VPNs. At the network edge, routes are exchanged, which travel through a system of route reflectors before reaching the provider edge routers closest to each customer location. Route reflectors (RRs)—physical devices or virtual software—are organized in a carefully designed hierarchy. Network architects group them strategically and split routing tables, so no single RR must store all routes. Operators create specific rules for sharing routes to ensure each customer location gets all the routing information it needs while preventing any individual reflector from becoming overloaded. When customers connect from various global Points of Presence (PoPs), routing information travels through a carefully designed hierarchy: The Control Plane Highway Visualization Picture this intricate routing ecosystem as an expansive highway network displayed on a geographic map. Each critical component—customer-edge routers, provider-edge routers, and route-reflectors—appears as a distinct node positioned at its physical location.  Routing information flows along directional arrows between nodes, with the arrow’s thickness proportional to the traffic volume through the number of routes being exchanged.  This customer-specific visualization reveals the complete journey of routing data, from its origins at various entry points to its distribution across all customer locations. The topology showcases the dynamic propagation of routes as they flow through the hierarchical system, highlighting potential bottlenecks, redundant paths, and the efficiency of the route distribution architecture. Analytics and Insights: The Control Plane Highway Monitoring System When we visualize the control plane as a highway system, complex routing data becomes intuitively accessible. Instead of being scattered across multiple databases and router tables requiring specialized query language knowledge, information appears as an interactive map with nodes (network elements) and edges (sessions) that anyone can understand. Traffic Control Centers: Monitoring Network Nodes Every junction on our highway (routers and reflectors) becomes clickable, revealing critical operational data. Like a modern traffic control center, we can instantly see: These metrics translate visually into familiar traffic signals: green lights indicate healthy nodes, yellow-orange lights warn of developing issues, and red lights signal violations where route propagation may be incomplete or blocked, preventing customers from reaching all their global Points of Presence (PoPs). Road Condition Monitoring: Analyzing Connection Edges The highways connecting these junctions (BGP sessions between network elements) provide equally valuable insights. Clicking on any road segment reveals: Traffic engineers monitor road conditions; similarly, our visualization uses color-coding to indicate connection health: green for optimal flow, yellow-orange for concerning conditions, and dark red for critical issues blocking route dissemination. Importantly, since each BGP session typically carries routing data for multiple customers across shared infrastructure, the status of a single edge can reveal broader impacts affecting many customers while highlighting redundant paths that maintain connectivity despite local disruptions. Patterns, Predictions, and Impact Analysis The highway metaphor extends naturally to trend analysis and capacity planning. Just as traffic engineers study road usage patterns to anticipate future needs, network operators can visualize: Since service providers use shared infrastructure for multiple customers, they must ensure that one customer’s behavior doesn’t create “traffic jams” affecting others. Using “what-if” analysis tools, operators can create virtual simulations for each customer, like digital twins, to analyze potential impacts before they occur. These simulations allow operators to observe how the control plane highway would respond if a customer exceeded current route limits or SLA thresholds: which paths would experience congestion first, how edge colors would shift from green to yellow to orange to red, and where bottlenecks might develop. This virtual environment enables strategic planning for customer growth and SLA management without risking disruption to the production network, allowing operators to test configuration changes, adjust session capacities, or add new RRs while ensuring service level commitments remain intact, all in a safe environment before implementing changes. Transforming Network Visibility Through Highway Visualization The control plane highway metaphor transforms network monitoring by visualizing routing information flows as a dynamic system with traffic signals. This gives operators clear visibility while making complex architectures intuitive for all stakeholders. This approach enables proactive management through trend analysis and simulation, ensuring robust control planes as networks grow increasingly complex. This approach is valuable for service providers and applies equally to enterprise networks, data centers, cloud environments, and any routing ecosystem. It makes the invisible infrastructure that powers our connected world visible and optimized for future challenges. Request a demo today to see how Selector visualizes the control plane like never before, empowering your team with deep routing insights and predictive analytics. To stay up-to-date with the latest news and blog posts from Selector, follow us on LinkedIn or X and subscribe to our YouTube channel. 

Making Network Intelligence Accessible to Everyone

For years, network operations have relied on complex query languages that demand specialized knowledge. Extracting insights from network data often meant writing intricate commands in formats like SQL, a skill reserved for seasoned IT professionals. But what if anyone, regardless of expertise, could ask a simple question and get immediate, accurate answers from their network? That’s precisely what Selector Copilot makes possible. With natural language querying (NLQ), teams can interact with their network the same way they interact with AI-powered chat assistants like ChatGPT or Google Gemini. Instead of struggling with syntax, they can simply ask, “What configuration changes were made in the last 24 hours?” and get a clear, actionable response.  This shift is more than just a convenience. It fundamentally changes how teams troubleshoot, analyze, and manage network operations. It’s the culmination of the layered intelligence we’ve explored throughout this series, starting with harmonized, enriched data, then correlation and context, and finally LLM-powered insights and AI agents. With that foundation in place, Copilot brings it all together in the most human way possible: conversation.  Complex Queries Are Hurting Your IT Efficiency Network operations generate massive amounts of data, but accessing that data traditionally requires deep technical skill. Traditional query languages like SQL force teams to spend valuable time crafting the right commands instead of focusing on solving problems.  This complexity creates bottlenecks for non-technical stakeholders. Suppose an operations manager needs to check system performance trends or verify whether a config change introduced instability. In that case, they must wait for a network engineer to write the proper query. That delay slows down decision-making and response time. Even within technical teams, only a few choice experts typically know how to extract insights from the data effectively.  Copilot addresses this head-on by making insights universally accessible.  Network Troubleshooting Shouldn’t Require an SQL Expert Selector Copilot eliminates these barriers by automatically translating natural language into S2QL (Selector Software Query Language) queries. Instead of memorizing syntax, users can type their queries in plain text English, and Selector Copilot converts them into the appropriate commands.  This capability is powered by Natural Language Translation (NLT), which ensures that questions are understood and contextualized based on your network telemetry. Whether troubleshooting a performance issue or analyzing trends, users can get the necessary information without requiring expertise in query languages.  And it’s not just passive responses. Selector’s AI can also explain why something is happening.  Behind the AI: How Selector Copilot Makes Network Data Instantly Accessible Selector’s Copilot uses a hybrid AI architecture combining Local and Cloud-based Large Language Models (LLMs) to ensure accuracy, performance, and security. Here’s how a natural language query flows through the system:  This hybrid AI approach ensures that users receive accurate, human-readable insights in seconds, transforming complex network analysis into a seamless, intuitive experience.  Why AI-Powered Cloud Models Deliver Deeper Network Intelligence While Local LLMs handle query translation and processing, Cloud LLMs take network insights to the next level by refining and visualizing data. These advanced AI models summarize key findings, identify patterns, and recommend next steps.  Beyond intelligence, security remains a top priority. Customer-specific vector stores isolate sensitive network data, while enterprise-grade security measures prevent unauthorized access. Unlike public AI models, enterprise-grade Cloud LLMs like Google Gemini 1.5 Pro do not train on customer queries or responses, eliminating concerns about data leakage.  By combining the speed of local AI processing with the depth of cloud-powered insights, Selector Copilot delivers a best-in-class network intelligence experience.   Faster Answers, Smarter Decisions: The Impact of AI in Network Ops The ability to interact with network data using natural language improves efficiency and democratizes access to information across entire organizations. Network engineers can troubleshoot issues faster without spending time on query syntax. IT operations teams can easily identify performance trends and diagnose network slowdowns. Even executives and operations managers can retrieve key performance insights in real time without relying on IT experts.  For example, instead of writing a technical query like:  A user can simply ask:  “What devices did User X access?” The result? Faster insights, improved collaboration, and a more efficient approach to network operations.  The Future of Network Operations with AI and Natural Language AI-driven network management is evolving rapidly, transforming how teams monitor, troubleshoot, and optimize their infrastructure. As networks become more complex, organizations need solutions that reduce operational friction and streamline decision-making. Natural language querying is at the center of this shift.  Soon, AI-powered assistants like Selector Copilot will go beyond answering queries, instead predicting issues before they occur, automating resolutions, and providing proactive insights. Enterprises will move from reactive troubleshooting to AI-assisted network intelligence, where problems are identified and addressed before they impact performance.  Scaling natural language querying across an enterprise requires seamless integration into existing workflows. By embedding AI-powered network intelligence into collaboration tools like Slack, Microsoft Teams, and ITSM platforms, organizations can break down silos and make real-time network insights accessible across teams. The shift toward self-service analytics will empower technical and non-technical users, ensuring that everyone, from engineers to executives, can make informed decisions without relying on a handful of internal experts.  For enterprises looking to stay ahead of the curve, adopting AI-driven natural language querying today is more than a convenience. It is a strategic advantage that will define the next era of network operations.  Your Network Has All The Answers – Now You Can Finally Hear Them With Selector Copilot, speaking to your network is now a reality. Teams can retrieve insights, diagnose issues, and analyze trends using natural language, eliminating the need for specialized query skills.  This isn’t just an incremental improvement. It’s a fundamental shift in how organizations manage their networks, enabling faster decision-making, better collaboration, and a future-proof approach to network intelligence.  Now is the time to embrace AI-powered network intelligence so your team can focus on solving problems instead of writing queries.  Request a demo today to see how Selector Copilot can simplify network queries and supercharge your network operations. To stay up-to-date with the latest news and blog posts

Unlocking the Power of LLMs and AI Agents for Network Automation

Artificial intelligence is reshaping how enterprises manage and secure their networks, but not all AI is created equal, and not all Large Language Models (LLMs) are ready for the job. While tools like ChatGPT and Google Gemini are transforming communication and productivity, applying general-purpose LLMs to something as specialized and high-stakes as network operations is an entirely different challenge.  Networks are dynamic, complex, and context-heavy. They’re built on domain-specific terminology, vendor-specific configurations, and constantly shifting operational states. You can’t just drop a generic chatbot on top of a network and expect meaningful results – at least, not without serious help.  Our previous post in this series explored how Selector connects the dots across telemetry sources using machine learning and correlation models. That context – unified, enriched, and real-time – is key to unlocking the next stage in AI-powered operations: safe, explainable automation through LLMs and intelligent agents.  Why Generic LLMs Fall Short in Networking At their core, LLMs are pattern recognition engines trained on massive amounts of public data. They excel at predicting language, but in enterprise environments, especially networks, context matters more than language. A network operator asking, “What changed in the last 24 hours on the Chicago routers?” needs a precise, data-driven response – not a guess, a hallucination, or a generic how-to.  LLMs struggle to provide relevant or accurate answers without access to real-time, domain-specific information about your network environment. They weren’t trained on your topology, logs, configurations, or business-critical thresholds. That’s where Selector’s approach stands apart.  Bridging the Gap with RAG: Domain-Specific Intelligence for LLMs Selector utilizes Retrieval-Augmented Generation (RAG) to make Large Language Models (LLMs) beneficial for networking. Instead of relying on a static training set, RAG dynamically injects real-time, domain-specific context into the query process.  Here’s how it works: when a user asks a question – say, “Why is there packet loss between San Jose and Atlanta?” – Selector doesn’t just send that prompt to the LLM. It first retrieves relevant logs, metrics, alerts, and event data from its unified telemetry layer. Then, that data is fed into the LLM with the original question, grounding the model’s response in your network’s actual state.  This combination of contextual retrieval and natural language generation makes Selector’s Copilot truly different. It’s not just smart. It’s informed. It delivers relevant, accurate, and actionable responses because they’re rooted in your real-time environment.  From Conversations to Actions: The AI Agent Framework Selector doesn’t stop at insights. Its agent framework takes things further by connecting conversational queries with automated workflows.  Once the LLM has identified a likely root cause or recommended action, the system can pass that result to an intelligent agent. These agents are configured to interact with your systems of record (like ServiceNow or Jira), generate a change ticket, initiate a remediation script, or even trigger an automated fix.  This unlocks a new operational model: chat-to-action. A network engineer can ask, “Restart the affected interface if error rate exceeds 10%” and the system will not only understand the intent but also validate the logic, retrieve relevant thresholds, and execute the command (or route it for approval).  With this framework, Selector makes AI not just a source of insight but a trusted operational partner.  Explainability Built In In high-stakes environments like networking, blind trust in AI isn’t an option. Selector understands this. Every insight generated – whether by the correlation engine, the LLM, or an AI agent – comes with a clear, traceable rationale.  Users can always see what data was used, how it was processed, and why a particular answer or action was recommended. This transparency builds trust and supports human-in-the-loop operations, where engineers remain in control while AI handles the heavy lifting.  Fast, Accurate, Cost-Effective Unlike many platforms that require extensive retraining or GPU-heavy infrastructure, Selector’s architecture is designed for practical, scalable deployment. Selector minimizes compute cost while maximizing responsiveness and accuracy by combining lightweight local inference (via tuned local models) with selective cloud-based LLM calls.  This hybrid approach means organizations get real-world value from LLMs, without the resource drain of running massive models internally.  The Future of Network Automation is Conversational LLMs are changing the way people interact with information. Selector is changing the way people interact with their networks.  By combining enriched, harmonized telemetry with domain-specific LLM intelligence and an AI agent framework, Selector enables a future where asking your network for answers – or actions – is as easy as typing a question. It’s automation with context, intelligence with transparency, and AI you can actually use today.  Want to see what it feels like to talk to your network? Try our free Packet Copilot and explore how natural language redefines what’s possible in network operations. And make sure to follow us on LinkedIn or X to be notified of the next post in our series, where we look at how natural language interfaces like Selector Copilot are making complex queries accessible to everyone – turning command lines into conversations. 

Breaking Down Silos with Correlation and Context

In modern IT environments, data is abundant, but clarity is rare. Enterprises deploy dozens of monitoring tools to collect metrics, events, and logs from across the network, yet when something goes wrong, teams still scramble to connect the dots. Why? Because these data streams exist in siloes, isolated by format, source, or system.  In the first blog in this series, we explored how Selector’s Data Hypervisor tackles that problem head-on – normalizing and enriching raw telemetry into a unified, context-rich data layer. It’s the essential first step in enabling AI-powered operations.  But harmonized data alone doesn’t solve the problem. To truly understand what’s happening in a dynamic, multi-domain network, you need to go one step further: correlation.  Each tool offers a limited view, forcing operators to conduct time-consuming investigations, switch between dashboards, review log entries, and chase alerts that may or may not be connected. The result? Slow resolution times, alert fatigue, and increased operational risk.  To truly unlock the promise of AIOps, we must go beyond data collection and into data correlation—the process of transforming fragmented telemetry into meaningful, actionable context. The Problem with Isolated Alerts In a typical NOC scenario, a sudden service degradation can trigger dozens of independent alerts: one from a switch reporting high interface errors,  log messages indicating that routing adjacencies have changed, another from synthetics noting a drop in performance, and additional alarms from adjacent devices. Each signal tells a piece of the story, but without context, operators are left guessing which is the root cause and which are symptoms.  This is the cost of disconnected data. Legacy systems treat metrics, synthetics,  logs, and events as standalone entities. They miss the relationships across time, topology, and service layers that define what’s happening. As networks become more complex and distributed, this lack of correlation worsens.  From Signals to Story: How Selector Connects the Dots Selector transforms this broken workflow by delivering real-time, cross-domain correlation through the use of AI and machine learning. It doesn’t just collect telemetry. It understands it.  At the heart of this capability is Selector’s Knowledge Service, which ingests enriched telemetry from the Data Hypervisor and uses a combination of recommender models and association models to uncover hidden relationships. By correlating across time series data, logs, alerts, and events, Selector tells a coherent story about what happened, when it happened, and why.  Whether the source is SNMP, syslog, ThousandEyes, vManage, or another telemetry stream, Selector doesn’t treat them as isolated channels. It normalizes and enriches the data, aligns it by timestamp and topology, and uses ML models to group related anomalies into a single, actionable insight.  The Power of S2QL and ML-Driven Correlation Selector’s engine is powered by the Selector Software Query Language (S2QL), a purpose-built query language that enables deep correlation across diverse data types. Combined with machine learning models trained to identify patterns, clusters, and outliers, Selector rapidly detects both known and novel issues.  The result is faster MTTI, MTTD, and MTTR – because operators don’t have to waste time digging through layers of alerts to find what matters. They’re given the root cause upfront, with all supporting evidence attached.  Explainable AI: Trust Built In While automation and AI are powerful, they must also be transparent and accountable. Selector builds human-in-the-loop explainability into every step. Each insight is accompanied by a clear rationale, traceable data sources, and the ability to drill down into contributing metrics, events, or logs.  Not only can operators trust the result, but they can also understand how it was reached. This creates confidence in the system’s recommendations, accelerates adoption, and reduces resistance to AI-powered workflows.  Real-World Impact – Without the Wait Unlike legacy platforms that require months of customization, training, or expensive GPU infrastructure, Selector delivers value in days. Its plug-and-play integrations and low-code configuration make it fast to deploy, even in large, complex environments.  And because the platform is built on Kubernetes and supports SaaS and on-prem deployments, it can scale up or down depending on your organization’s needs with no disruption to existing workflows.  Correlation is the Bridge to AI-Driven Automation Correlation isn’t just about better alerts. It’s the gateway to autonomous operations. When systems can understand the full context of an issue, they can begin to anticipate failures, recommend remediations, or even trigger automated workflows through integrations like Itential, ServiceNow, or PagerDuty.  Selector’s architecture is built for this future. The same models that correlate anomalies today will drive predictive alerts, capacity forecasting, and self-healing actions tomorrow.  If your team is drowning in alerts but starving for answers, it’s time to shift from fragmented signals to unified stories. Selector enables context-aware correlation that’s fast, explainable, and designed for modern network complexity.  Schedule a demo to see how Selector can help your team move from noise to knowledge, and follow us on LinkedIn or X to be notified of the next blog in “The Path to AI-Powered Network Operations” where we will explore how Selector leverages LLMs and AI Agents to unlock the next level of intelligent automation. 

Why Data Harmonization is Critical to Your AIOps Strategy

Picture this: Your phone rings in the middle of the night. It’s your engineering lead, calling to inform you of a significant outage affecting your customer-facing services. As your network operations team jumps into action, they’re greeted with chaos. Over 40 alerts flood their screens simultaneously. Your network, infrastructure monitoring, and application performance monitoring tools all fire independently, each with its own dashboard and presenting data in incompatible formats.  It’s like trying to solve a jigsaw puzzle blindfolded, with pieces scattered across multiple rooms and no map of how they connect. Without full-stack visibility across all layers, valuable time is lost trying to piece together the fragmented clues, which prolongs downtime and costs businesses thousands of dollars per minute. The longer it takes to identify the root cause, the longer your customers and revenue will remain impacted. In scenarios like these, disconnected data isn’t just inconvenient. It’s financially devastating.  Why Disconnected Data Is Holding Back Network Operations Today’s enterprises are drowning in data but starving for insights. Operations teams, like in the example above, face the daunting challenge of managing massive volumes of telemetry from across the entire technology stack, spanning network hardware, infrastructure platforms, and distributed applications. Each system and vendor produces data in a different format, resulting in isolated information scattered across dozens of dashboards and tools. As data volumes surge, the task of troubleshooting becomes not only overwhelming but often nearly impossible.  At the core, this isn’t just a complexity problem. It’s a data quality problem. Before organizations can leverage advanced technology like Artificial Intelligence for IT Operations (AIOps), they must first confront a foundational yet often overlooked challenge: data harmonization. Introducing Selector’s Data Hypervisor: Your Path to Unified Data Selector recognized early on that before AI could revolutionize network operations, enterprises first needed a more innovative, more unified way to handle their data. That’s why Selector built its unique Data Hypervisor technology – an innovative approach that transforms the way organizations ingest, enrich, and leverage network, infrastructure, and application data across all seven layers of the stack.  Much like a virtualization hypervisor decouples physical hardware from the virtual machines, Selector’s Data Hypervisor decouples your diverse data sources from their native formats. The hypervisor ingests every type of operational data imaginable – logs, metrics, events, configurations, operational states, and flow data from networks, infrastructure, and applications – then automatically normalizes and enriches this data to provide a unified, vendor-agnostic view. This data normalization makes previously siloed data streams ready for advanced analytics and unified dashboards, thereby eliminating the need for costly manual correlation.  But normalization is only part of the story. The Data Hypervisor also enriches incoming data with critical contextual metadata, such as labels indicating location, peer devices, circuit IDs, customer name, or application relationships, making the data more meaningful. Context transforms isolated events into actionable insights, bridging the gaps between siloed tools and datasets.  How Selector Uses Machine Learning to Automate Data Enrichment Traditional methods for parsing and enriching data often depend on rigid rules and manually maintained regular expressions. This is a fragile, maintenance-intensive approach. Selector’s Data Hypervisor replaces these outdated methods with advanced machine learning models that automatically interpret and structure unstructured or semi-structured data.  Rather than needing thousands of handcrafted parsing rules, Selector’s ML-driven approach quickly and accurately extracts relevant information, categorizes events, identifies anomalies, and clusters related issues. This capability drastically reduces manual overhead and error rates, enabling IT teams to shift their focus from managing data to solving actual problems.  This isn’t just theoretical: Selector customers consistently achieve drastic reductions in alert noise – up to a 98% reduction in ticket volume – enabling teams to focus immediately on real issues.  Laying the Foundation for AI-Driven Insights Selector’s approach to data harmonization is more than just operational convenience. It is essential groundwork for full-stack, AI-driven network operations. Studies in machine learning research emphasize that raw data preprocessing is a challenging and time-consuming task that directly impacts model performance, with a substantial portion of raw data requiring transformation before it becomes useful for AI applications. Selector’s meticulous data enrichment and normalization significantly enhance the usability of data collected from all layers, ensuring that the resulting insights and predictions are accurate, actionable, and trustworthy.  Furthermore, Selector’s solution delivers immediate value. Unlike traditional approaches that require months of extensive setup, Selector can begin providing insights within days, without the need for massive infrastructure investments, such as GPUs. This rapid time-to-value, combined with cost efficiency, makes Selector not only powerful but also practical for businesses looking to make AI-driven operations a reality.  What’s Next: From Unified Data to Autonomous Networks Effective AIOps isn’t just about adopting AI tools, but also about thoughtfully preparing your infrastructure to support them. Selector’s Data Hypervisor clears away the chaos, laying a robust foundation for next-level AI applications, such as automated correlation, natural language querying, conversational interfaces, and autonomous network operations.  In our next blog, we’ll explore how Selector leverages machine learning to correlate network events in real time, unlocking automated insights and laying the groundwork for predictive analytics and AI-driven automation.  Ready to transform your network operations? Schedule a demo today to see Selector in action, and follow us on LinkedIn or X to be notified of the next blog in this series as we continue your journey toward autonomous network management. 

This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.