Discover how AI, automation, and observability are transforming network operations. The Selector AI Blog shares expert perspectives, technical deep dives, and real-world insights for IT, engineering, and operations leaders.
In modern IT environments, data is abundant, but clarity is rare. Enterprises deploy dozens of monitoring tools to collect metrics, events, and logs from across the network, yet when something goes wrong, teams still scramble to connect the dots. Why? Because these data streams exist in siloes, isolated by format, source, or system. In the first blog in this series, we explored how Selector’s Data Hypervisor tackles that problem head-on – normalizing and enriching raw telemetry into a unified, context-rich data layer. It’s the essential first step in enabling AI-powered operations. But harmonized data alone doesn’t solve the problem. To truly understand what’s happening in a dynamic, multi-domain network, you need to go one step further: correlation. Each tool offers a limited view, forcing operators to conduct time-consuming investigations, switch between dashboards, review log entries, and chase alerts that may or may not be connected. The result? Slow resolution times, alert fatigue, and increased operational risk. To truly unlock the promise of AIOps, we must go beyond data collection and into data correlation—the process of transforming fragmented telemetry into meaningful, actionable context. The Problem with Isolated Alerts In a typical NOC scenario, a sudden service degradation can trigger dozens of independent alerts: one from a switch reporting high interface errors, log messages indicating that routing adjacencies have changed, another from synthetics noting a drop in performance, and additional alarms from adjacent devices. Each signal tells a piece of the story, but without context, operators are left guessing which is the root cause and which are symptoms. This is the cost of disconnected data. Legacy systems treat metrics, synthetics, logs, and events as standalone entities. They miss the relationships across time, topology, and service layers that define what’s happening. As networks become more complex and distributed, this lack of correlation worsens. From Signals to Story: How Selector Connects the Dots Selector transforms this broken workflow by delivering real-time, cross-domain correlation through the use of AI and machine learning. It doesn’t just collect telemetry. It understands it. At the heart of this capability is Selector’s Knowledge Service, which ingests enriched telemetry from the Data Hypervisor and uses a combination of recommender models and association models to uncover hidden relationships. By correlating across time series data, logs, alerts, and events, Selector tells a coherent story about what happened, when it happened, and why. Whether the source is SNMP, syslog, ThousandEyes, vManage, or another telemetry stream, Selector doesn’t treat them as isolated channels. It normalizes and enriches the data, aligns it by timestamp and topology, and uses ML models to group related anomalies into a single, actionable insight. The Power of S2QL and ML-Driven Correlation Selector’s engine is powered by the Selector Software Query Language (S2QL), a purpose-built query language that enables deep correlation across diverse data types. Combined with machine learning models trained to identify patterns, clusters, and outliers, Selector rapidly detects both known and novel issues. The result is faster MTTI, MTTD, and MTTR – because operators don’t have to waste time digging through layers of alerts to find what matters. They’re given the root cause upfront, with all supporting evidence attached. Explainable AI: Trust Built In While automation and AI are powerful, they must also be transparent and accountable. Selector builds human-in-the-loop explainability into every step. Each insight is accompanied by a clear rationale, traceable data sources, and the ability to drill down into contributing metrics, events, or logs. Not only can operators trust the result, but they can also understand how it was reached. This creates confidence in the system’s recommendations, accelerates adoption, and reduces resistance to AI-powered workflows. Real-World Impact – Without the Wait Unlike legacy platforms that require months of customization, training, or expensive GPU infrastructure, Selector delivers value in days. Its plug-and-play integrations and low-code configuration make it fast to deploy, even in large, complex environments. And because the platform is built on Kubernetes and supports SaaS and on-prem deployments, it can scale up or down depending on your organization’s needs with no disruption to existing workflows. Correlation is the Bridge to AI-Driven Automation Correlation isn’t just about better alerts. It’s the gateway to autonomous operations. When systems can understand the full context of an issue, they can begin to anticipate failures, recommend remediations, or even trigger automated workflows through integrations like Itential, ServiceNow, or PagerDuty. Selector’s architecture is built for this future. The same models that correlate anomalies today will drive predictive alerts, capacity forecasting, and self-healing actions tomorrow. If your team is drowning in alerts but starving for answers, it’s time to shift from fragmented signals to unified stories. Selector enables context-aware correlation that’s fast, explainable, and designed for modern network complexity. Schedule a demo to see how Selector can help your team move from noise to knowledge, and follow us on LinkedIn or X to be notified of the next blog in “The Path to AI-Powered Network Operations” where we will explore how Selector leverages LLMs and AI Agents to unlock the next level of intelligent automation.
Picture this: Your phone rings in the middle of the night. It’s your engineering lead, calling to inform you of a significant outage affecting your customer-facing services. As your network operations team jumps into action, they’re greeted with chaos. Over 40 alerts flood their screens simultaneously. Your network, infrastructure monitoring, and application performance monitoring tools all fire independently, each with its own dashboard and presenting data in incompatible formats. It’s like trying to solve a jigsaw puzzle blindfolded, with pieces scattered across multiple rooms and no map of how they connect. Without full-stack visibility across all layers, valuable time is lost trying to piece together the fragmented clues, which prolongs downtime and costs businesses thousands of dollars per minute. The longer it takes to identify the root cause, the longer your customers and revenue will remain impacted. In scenarios like these, disconnected data isn’t just inconvenient. It’s financially devastating. Why Disconnected Data Is Holding Back Network Operations Today’s enterprises are drowning in data but starving for insights. Operations teams, like in the example above, face the daunting challenge of managing massive volumes of telemetry from across the entire technology stack, spanning network hardware, infrastructure platforms, and distributed applications. Each system and vendor produces data in a different format, resulting in isolated information scattered across dozens of dashboards and tools. As data volumes surge, the task of troubleshooting becomes not only overwhelming but often nearly impossible. At the core, this isn’t just a complexity problem. It’s a data quality problem. Before organizations can leverage advanced technology like Artificial Intelligence for IT Operations (AIOps), they must first confront a foundational yet often overlooked challenge: data harmonization. Introducing Selector’s Data Hypervisor: Your Path to Unified Data Selector recognized early on that before AI could revolutionize network operations, enterprises first needed a more innovative, more unified way to handle their data. That’s why Selector built its unique Data Hypervisor technology – an innovative approach that transforms the way organizations ingest, enrich, and leverage network, infrastructure, and application data across all seven layers of the stack. Much like a virtualization hypervisor decouples physical hardware from the virtual machines, Selector’s Data Hypervisor decouples your diverse data sources from their native formats. The hypervisor ingests every type of operational data imaginable – logs, metrics, events, configurations, operational states, and flow data from networks, infrastructure, and applications – then automatically normalizes and enriches this data to provide a unified, vendor-agnostic view. This data normalization makes previously siloed data streams ready for advanced analytics and unified dashboards, thereby eliminating the need for costly manual correlation. But normalization is only part of the story. The Data Hypervisor also enriches incoming data with critical contextual metadata, such as labels indicating location, peer devices, circuit IDs, customer name, or application relationships, making the data more meaningful. Context transforms isolated events into actionable insights, bridging the gaps between siloed tools and datasets. How Selector Uses Machine Learning to Automate Data Enrichment Traditional methods for parsing and enriching data often depend on rigid rules and manually maintained regular expressions. This is a fragile, maintenance-intensive approach. Selector’s Data Hypervisor replaces these outdated methods with advanced machine learning models that automatically interpret and structure unstructured or semi-structured data. Rather than needing thousands of handcrafted parsing rules, Selector’s ML-driven approach quickly and accurately extracts relevant information, categorizes events, identifies anomalies, and clusters related issues. This capability drastically reduces manual overhead and error rates, enabling IT teams to shift their focus from managing data to solving actual problems. This isn’t just theoretical: Selector customers consistently achieve drastic reductions in alert noise – up to a 98% reduction in ticket volume – enabling teams to focus immediately on real issues. Laying the Foundation for AI-Driven Insights Selector’s approach to data harmonization is more than just operational convenience. It is essential groundwork for full-stack, AI-driven network operations. Studies in machine learning research emphasize that raw data preprocessing is a challenging and time-consuming task that directly impacts model performance, with a substantial portion of raw data requiring transformation before it becomes useful for AI applications. Selector’s meticulous data enrichment and normalization significantly enhance the usability of data collected from all layers, ensuring that the resulting insights and predictions are accurate, actionable, and trustworthy. Furthermore, Selector’s solution delivers immediate value. Unlike traditional approaches that require months of extensive setup, Selector can begin providing insights within days, without the need for massive infrastructure investments, such as GPUs. This rapid time-to-value, combined with cost efficiency, makes Selector not only powerful but also practical for businesses looking to make AI-driven operations a reality. What’s Next: From Unified Data to Autonomous Networks Effective AIOps isn’t just about adopting AI tools, but also about thoughtfully preparing your infrastructure to support them. Selector’s Data Hypervisor clears away the chaos, laying a robust foundation for next-level AI applications, such as automated correlation, natural language querying, conversational interfaces, and autonomous network operations. In our next blog, we’ll explore how Selector leverages machine learning to correlate network events in real time, unlocking automated insights and laying the groundwork for predictive analytics and AI-driven automation. Ready to transform your network operations? Schedule a demo today to see Selector in action, and follow us on LinkedIn or X to be notified of the next blog in this series as we continue your journey toward autonomous network management.
The 2025 Gartner Market Guide for Event Intelligence Solutions arrives at a critical time for organizations facing increasing complexity in managing IT events. Today’s diverse, distributed IT environments create significant operational challenges – alert fatigue, fragmented tools, and slow incident response – impacting both efficiency and customer experiences. Event Intelligence Solutions (EIS) address these challenges directly, employing AI and advanced analytics to simplify, accelerate, and automate event management. What Gartner’s Event Intelligence Solution Requirements Mean for Your Business Gartner identifies five mandatory features that an Event Intelligence Solution should include to overcome the most pressing challenges faced by IT and network teams: How Selector Aligns with Gartner’s Event Intelligence Solution Requirements Selector’s inclusion as a representative vendor in Gartner’s 2025 Market Guide for Event Intelligence Solutions highlights its position as an innovative leader. Through powerful, differentiated capabilities, Selector not only meets but also surpasses Gartner’s essential criteria. Unified Cross-Domain Visibility: Selector’s advanced integration capabilities ingest telemetry data seamlessly from over 300 sources. This comprehensive integration breaks down data silos and provides unprecedented visibility across the entire IT landscape. Dynamic Topology Mapping and Digital Twin: Selector dynamically visualizes and continuously updates the topology of network, infrastructure, and service dependencies. Selector’s Digital Twin also enables users to model hypothetical scenarios, predict potential failures, and optimize resource allocation, providing strategic insights beyond basic visualization ML-Driven Event Correlation and Contextual Enrichment: Selector employs sophisticated machine learning algorithms to correlate events intelligently, dramatically reducing noise by up to 95%. Selector’s advanced event correlation, root cause analysis, and smart alerting empower teams with actionable, enriched context directly within their collaboration tools, streamlining incident response. Predictive Analytics and Anomaly Detection: Selector sets itself apart through powerful predictive analytics capabilities. Its AI-driven forecasting and anomaly detection proactively identify issues, significantly minimizing incidents by anticipating disruptions before they impact services. Selector Copilot and Automated Remediation: Selector’s innovative Copilot uses advanced conversational AI and Natural Language Models (NLM) to simplify complex incident investigations through intuitive, plain-language interactions. This makes advanced analytics accessible across the entire organization. Additionally, Selector integrates seamlessly with automation platforms and ITSM solutions, automating incident remediation workflows efficiently. Selector’s Strategic Advantage Selector continuously innovates, delivering a scalable, intuitive platform that enhances resilience, minimizes downtime, and accelerates incident response. As organizations navigate increasingly complex IT environments, Selector positions teams to tackle today’s operational challenges effectively and proactively prepare for future demands. To learn more about how Selector aligns with Gartner’s strategic insights and can drive significant value for your organization, schedule a demo with one of our network experts. To stay up-to-date with the latest news and blog posts from Selector, follow us on LinkedIn or X and subscribe to our YouTube channel.
Network operations have become increasingly complex due to the distributed nature of modern applications which use data from private data centers, public clouds and the internet to provide end user services. With the adoption of these multi-cloud, multi-tier application architectures, network engineers must integrate new services (e.g AWS Direct Connect and Kubernetes clusters) from cloud providers into their existing services. To make operations more challenging, operational and data access methods of different network services are quite different. For example, cloud providers only provide an API (e.g Cloudwatch) to get operational metrics, while traditional networking equipment provides SNMP, streaming telemetry and Netconf interfaces. As a result, introducing new network services and MOPs is expensive as they can disrupt networks carrying revenue generating traffic. Introducing Selector’s Network Digital Twin (NDT), a digital representation of your network created from real time data. This service enables efficient decision-making and experimentation without interfering with the real network, reducing risks to your infrastructure. The NDT gathers all network metrics, logs and events in real-time and draws correlations using ML. Unlike legacy solutions which use “simulation software”, the NDT relies on real time operational metrics and logs. It visualizes and analyzes the present network state, learns from the past, and fast-forwards to predict future scenarios. Real Time view of the network: NDT offers a real-time view of the network and services topologies. It shows the quality of different paths, including end-to-end latency, any packet drops along the path, and queue build-ups. This information allows for faster troubleshooting and increased resilience by pinpointing the exact moment an issue occurs and tracing its effects across the network. The paths are learnt by a variety of mechanisms such as real time BGP LS streams, RIBs via BMP, and LLDP polling via SNMP. Network DVR: NDT records all historical data depending on the retention policies configured on the platform. This ability is used to playback past events to diagnose root causes of the issue. Outages or chronic issues are caused by slow buildup of bad events; for example, slight topology changes cause increases in the number of hops in a path, thus increasing latencies. The NDT addresses issues by detecting the latency increases using ML, then DVRing in the past to show when the topology changed, and highlighting the configuration change that may have mistakenly increased the costing of a link, causing the path change. What-If analysis: Using the experimentation platform of the NDT, customers can stress-test network resiliency against link/optical faults, device failures, or impact of excess traffic from a potential DDoS attack. Users can identify the sites that are prone to blackouts by simulating link/node failures. For example simulation scenarios can be created where the number of prefixes advertised by a route reflector are increased. Then one can observe whether any circuit breakers that limit prefix advertisements get tripped. This feature allows network operators to determine potential choke points in the network and address them proactively. Growth Planning and Cost Optimization: NDT allows customers to identify and resolve network bottlenecks and repurpose underutilized resources. Customers can model different growth scenarios as new users or new services come aboard the network. This type of scenario planning helps customers to manage their CapEx and OpEx budgets and run a profitable network. By using ML based predictive algorithms, the NDT can proactively inform operators that the number of 10G ports in a given network location should be increased as there is likely to be an increase in user demand in that location. Summary: Selector’s network digital twin transforms how you operate your network. The NDT is the DVR of networking: by looking into the past, it helps you make decisions for the future. The experimentation platform allows you to test new services and network configurations without putting your existing networks at risk. Ready to see your network in a new light? Contact Selector today to learn more about our digital twin solution.
Understanding the state of your network and infrastructure is a critical responsibility for operations teams. Without their ever-watchful eye, network issues can cause problems ranging from annoying performance issues to downtime. To detect, prevent, and address these issues, operations teams have relied on a combination of monitoring and manual correlation, leveraging whatever tools were available. This approach has proven to be time-consuming for staff, prone to human error, and can lead to cascading issues when problems are misdiagnosed. Moreover, the complexity and volume of data generated by the environment continue to grow, straining resource-strapped Ops teams. Fortunately, a combination of observability and AIOps introduced by Selector offers a new approach to address these challenges. Selector auto-discovers devices, metadata, and relative topology—and stores the data within a hosted Configuration Management Database (CMDB). The collected metadata informs per device profiles driving telemetry collection. The telemetry is then continuously analyzed by AI/ML, correlating anomalies and identifying root causes. The overall solution accelerates troubleshooting, enabling operators to focus on remediation. Discovering Devices and Collecting Topology A key advantage of Selector’s solution is its ability to automatically collect and leverage the inventory and topology of the network. Selector offers an agent to discover and map your network infrastructure. The process works similarly to how a network engineer might accomplish the task: using credentials (kept securely, on-site) to incrementally log into the various network devices in your environment and collect the related topology data. Once connected to a device, Selector fingerprints the vendor, model, and version. Network topology data such as ARP, STP, CDP, LLDP, routing protocol sessions, and other data are then used to decide where to go next to repeat the process. Syncing Network Topology Topology data and related metadata are synced internally to Selector’s internal CMDB. CMDBs were historically little more than a basic file containing details of an organization’s IT environment—names of network entities, IP addresses, and credentials. However, in today’s complex IT environments, CMDBs have become critical tools for IT administrators. Warehousing the complete inventory, topology, and metadata for all discovered devices, Selector’s CMDB subsequently serves as the single source of truth for everything related to your environment. Customers may use Selector’s CMDB to: However, Selector’s CMDB has another crucial role: enabling Selector’s AI/ML to connect the dots across your devices. Telemetry Collection, Processing, and Outcomes Using a lightweight agent, Selector collects SNMP and streaming telemetry from each discovered device, exercises network connectivity with synthetic traffic and receives syslogs. The agent then securely transmits the associated telemetry to an instance of Selector hosted in the cloud or on your premises. Selector continuously analyzes this telemetry, applying AI/ML techniques to baseline the received telemetry and identify any anomalies. Next, contextual, temporal, and topological correlation determines the relationship between those anomalies. Associative models then identify the probable root cause of emerging incidents. The solution is further differentiated through Selector’s AI/ML-driven Smart Alerting. Smart alerting works through the baselining of telemetry—understanding what is typical for your environment—rather than relying on static, manual thresholds like legacy solutions. This helps ensure that alerting is meaningful rather than an arbitrary preference. Selector’s correlations automatically consolidate and deduplicate the telemetry data collected by the system, ensuring the team receives discrete, actionable alerts. Selector’s correlations can also be represented as a JSON object, through which the team can drive various operations activities—for example, ticket creation/maintenance and even closed-loop automation. A genAI-based conversational interface further enables operators to explore telemetry using natural language, and dynamic dashboards automatically provide relevant views into telemetry of interest. Conclusion Selector’s modern network observability offers a robust solution for operations teams to efficiently detect, prevent, and address network issues. With its automated device discovery, telemetry collection, and smart alerting capabilities, Selector’s solution automates much of the toil involved in setting up an effective observability platform. Further, AIOps moves teams towards proactive management of their networks, helping to reduce downtime and improve overall performance.
Selector returned to Networking Field Day this year to present our latest developments in network AIOps. Cofounder and CTO Nitin Kumar, along with VP Solutions Engineering Debashis Mohanty and Principal Solution Architect John Heintz, explored how Selector’s GenAI-driven conversational interface promises to not only address today’s network operations challenges, but transform the industry. Read on to catch the highlights of the live-streamed presentation which occurred on July 11, 2024. Today’s Network Operations Challenges The scale and complexity of modern networks has made it increasingly difficult to make sense of the available data and support incident remediation workflows. For instance, if cash registers at a retail establishment cannot connect to the payment application, the issue could reside with a number of different elements within the stack, including the wifi, local area network (LAN), software-defined wide area network (SD-WAN), Internet, multiprotocol label switching (MPLS), cloud services (e.g., AWS, Azure), or even the payment application itself (Figure 1). Experts or vendors for each of these domains may have access to the associated data, but the average user cannot collect all the data nor holistically interpret it. The ability to access the data and associated insights from all of these domains is known as data democratization. Selector supports data democratization through our GenAI-powered conversational interface as well as our consolidated smart alerting—both available via the customers’ preferred communication platform. Bridging The Chasm Obstacles currently exist which prevent the application of open-source AI services (e.g., ChatGPT, Microsoft Copilot, Google Gemini) to data originating from network infrastructure. To start, there’s the logistical issue of transporting the on-premises network operations data to these predominantly cloud-based services. The raw data is not only spread across thousands of devices, but often originates from different domains, vendors, and protocols. Further, the data exists in multiple formats, such as metrics, logs, events, and topology, and the accompanying metadata may be inconsistent and malformed. Additionally, open-source AI services are trained in English—not the nuanced semantics of network operations data. And lastly, these AI services mine your data and pass it along to other organizations, presenting a significant security and privacy risk. Selector solves these issues, and effectively bridges the chasm, through our innovative approach to data collection, processing, and analytics, in conjunction with application of precisely trained LLMs (large language models). Mapping English Phrases to Queries Selector’s processing of English phrases and subsequent application of GenAI relies on two core features. The first one involves addition of a uniform query interface to the data storage layer. For this interface, Selector chose SQL (structured query language). The second feature is LLM-translation of the English phrase to a SQL query so that it can access the query interface. The diagram below (Figure 1) depicts how Selector leverages our LLM to translate an English phrase into a query, as seen in the translation layer. The returned query then runs on the query layer, which exists above the data storage layer. Inside the Selector LLM The Selector LLM first determines which table or tables to query. Then, it focuses on keywords (e.g., Ashburn, GigE100*, last 2 days), applying these as filters to the table. The LLM manages these tasks through a process known as imputing, or inferring (Figure 3). A typical function takes input and computes output using a set of parameters. An LLM takes output and using a model, imputes a function and its parameters—a manner of reverse mapping. At its core, the Selector LLM model relies on a base, open-source model from the public domain. From there, it is trained with English data, enabling it to understand English language phrases. Next, Selector fine-tunes the LLM model with a dataset that teaches it to convert from English to Selector’s SQL query interface. These two steps do not involve the customer and occur in Selector’s cloud. We publish and deploy updates to this model periodically. Then, for each customer deployment, Selector fine-tunes the LLM model with client-specific data and entities—a process which occurs on-premises at the customer. In this way, models are customized to each client’s unique solution. For example, Customer A’s model is different from Customer B’s and Customer C’s (Figure 4). The models are atomic and can be on-premises or in the cloud, depending on the client’s preference. Selector Copilot Selector Copilot, our GenAI-driven conversational interface, enables network operators to ask questions in plain English. For example, an operator might prompt Selector Copilot to “Show me errors in Ashburn” or pose the question, “Are there any port errors in Ashburn on all GigE100* in the last 2 days?” For each query, Selector Copilot returns a visualization and summary of the results. Users can then interact with Selector to explore these results. They can also copy and paste each visualization from Selector Copilot onto a dedicated dashboard. As an example, let’s say a cloud service provider asks Selector Copilot about the latency of their transit routers over the last two days. Selector Copilot delivers a visualization and summary, revealing higher latency within a certain region. The user can then drill down into network topology information to further investigate the higher latency, accessing relevant details about devices, interfaces, sites, and circuits. Selector Alerting Selector’s consolidated alerts reveal the nature and severity of an incident along with the probable root cause and a comprehensive list of affected devices. Selector’s alerting process relies on a pipeline in which machine learning and statistical algorithms first determine violations or anomalies across all network, application, and infrastructure data. These events are then clustered into correlation trees or graphs which indicate a single incident. Selector summarizes these underlying event correlation graphs into consolidated alerts delivered via the customers’ preferred collaboration platform such as Slack or Microsoft Teams. Alerts can also be configured to automatically create tickets and map all relevant information into them. This enables downstream teams to commence remediation activities. Watch the Selector presentation at Networking Field Day 35
Quality medical care today relies on “health systems” built from geographically distributed healthcare settings such as hospitals, urgent care clinics, imaging centers, nursing homes, pharmacies, and specialist offices, among many others. Each setting shares data within the broader health system through Electronic Medical Records (EMRs). EMR systems, which were purpose-built to manage patient records, help improve patient outcomes through the real-time sharing of patient data. Several EMR systems are available today, with Epic being the most prominent. Over time, EMRs have evolved into a broader healthcare operating system, helping administrators manage health records, nurse work assignments, nurse reviews, employee assessments, payroll, budgeting, recruiting, patient portals, patient records, and more. Because EMRs are so deeply ingrained into the hospital’s business, even the slightest performance issue materially affects the entire organization. Unfortunately, performance issues occur frequently, and the impact on the organization is often severe. This is because the EMR system is part of a complex healthcare IT ecosystem, where an issue in one place, such as the network, can wreak havoc across all the systems within the environment. Given these risks, organizations must ensure the proper health of the IT environment across their stack—from network, compute, and storage infrastructure to cloud and EMR applications. Historically, administrators relied on a suite of products to help address these challenges. IT teams would manually inspect the data and connect the dots to resolve problematic issues. As one might imagine, this process is time-consuming and requires significant effort on behalf of the team. Fortunately, Selector’s purpose-built solution for healthcare observability delivers comprehensive full-stack observability and powerful AI/ML-driven capabilities, such as root cause analysis and event correlation, that continuously analyze the environment, surfacing insights in real-time. The Challenges of Monitoring Healthcare IT The historical challenges of performance monitoring and troubleshooting EMRs are mainly due to the complexity of the ecosystem they are within. This ecosystem includes network, compute/SAN, virtualization/VSAN, and applications and is often composed of hundreds to thousands of servers, switches, routers, firewalls, access points, tablets, mobile desktops, and more. Complex Ecosystem Network All services sit on top of the network, which can often involve thousands of different types of devices. Because network performance issues inevitably affect the services sitting on top of it, understanding network health is essential. However, this is challenging due to the complexity of modern networks and challenges such as Wi-Fi congestion, routing and switching issues, firewall misconfigurations, and transient capacity constraints, to name a few. These conditions all translate to difficult-to-troubleshoot problems across the IT environment. Compute/SAN On top of the network is the physical infrastructure, including CPU, memory, and disk, all connected via the network. Also, at this layer are the various storage solutions employed by the health system to support the long-term storage of medical records and patient imaging. Virtualization The VMware ecosystem is built on top of compute and storage infrastructure and consists of the ESXi, vSphere, and VSAN services. Monitoring VMware’s health is essential because if the VMWare hosts are not working correctly, the guest virtual machines are unlikely to operate as expected. Applications Many different applications support the hospital and are typically hosted within VMware. These include the various services that comprise Epic, including BCA (prescriptions), EPS (printing services), and BLOB, the binary object store that manages the different aspects of collecting and retrieving the data. Limitations of Current Solutions Given the complexity of healthcare IT ecosystems, issues can easily go unnoticed. Consider this scenario: Hard drives degrade within the SAN servicing the block store service—Epic’s BLOB service, a critical component of the Epic ecosystem. Aggregate seek times, throughput, and general performance of the SAN begin to decline. Over time, users start to complain. Everything works, but things seem slower. Nurses, doctors, and administrators are all experiencing the issue. Epic’s integrated monitoring—SystemPulse—shows the application is working as expected, but the reports continue. At this point, the team will start guessing what the problem is. Meanwhile, the performance degradation has begun to materially affect the organization. Quickly resolving issues like these requires an observability solution that provides comprehensive visibility across the entire ecosystem, from the network through the applications. This solution also needs to collect and correlate telemetry across every single one of these domains. Further, due to the sensitivity of healthcare data, a solution must be able to securely collect data from on-premises environments. Unfortunately, many existing solutions do not meet all these needs. Products like SolarWinds, for instance, do not have all the data (they have no way of collecting EPIC data, for example) and cannot bridge the data between different domains, so capabilities such as automated correlation and root cause analysis are not possible. Without a solution to address the challenges of this environment, identifying and resolving problems continues to be time-consuming, creating headaches for all healthcare staff trying to leverage essential systems like Epic on a routine basis. Selector: Observability and AIOps for the Epic EMR Environment To help hospital operators proactively optimize performance and identify issues before they cause problems, Selector provides a purpose-built technology for end-to-end observability of the healthcare IT environment, which can be deployed on-premises or in the cloud. Comprehensive Visibility Enables True Full-Stack Observability Selector provides comprehensive observability by collecting and analyzing real-time performance telemetry from the network to applications and everything in between. It collects the metrics, logs, and events as well as all the metadata from every layer within the healthcare IT ecosystem—network, compute, storage, cloud, virtualization, applications—and warehouses the data in the Selector data lake. As a result, Selector enables true full-stack observability, accounting for the network layer when most other vendors have ignored this critical component. Auto-baselining Immediately Surfaces Abnormalities Selector applies machine learning to all metrics and logs from its ingest layer, surfacing anomalies. It baselines telemetry in real-time and immediately identifies when a system’s performance deviates from its normal behavior, including accounting for cyclicity—what’s expected during certain times of day—and seasonality—what’s typical for certain times of the year. These baselines are then used to power dynamic alerting rules, sparing IT
For the second consecutive year, Selector has been recognized as a Best Place to Work in the Bay Area. In 2023, the company was ranked among the top 10 companies with 25–49 employees. This year, the company ranked again in the top 10. This regional award is sponsored by the San Francisco Business Times and Silicon Valley Business Journal in collaboration with Quantum Workplace—an employee engagement data firm. The winning Bay Area companies for 2024 were announced in March, with the ceremony and ranking on June 13. Learn what this honor means for Selector as we continue to advance the world’s first unified observability and AIOps platform. Measuring Employee Engagement and Satisfaction The Best Places to Work in the Bay Area award recognizes leading employers in the region. These companies stand out for enabling employee engagement and workplace satisfaction, cultivating an environment where employees are happily committed to their work, advocate for their company, and plan to stay with the company. To be nominated, companies must have an office in any of the twelve area cities and a minimum of 25 employees. These employees must work permanently at the company’s Bay Area office or be US-based remote employees who report to that office. After a company is nominated, Quantum Workplace conducts a survey to determine which companies employees thrive, engage in, enjoy meaningful work, and feel their voices are heard. For Selector, this survey went to employees who work at or report to our Santa Clara, CA headquarters. The survey allows employees to share their experience working for the nominated companies. It measures various research-validated workplace factors that impact employee engagement and satisfaction. For example, it inquires about compensation, benefits, employee engagement, and trust in senior leadership. Each question on the survey has a corresponding response option, each assigned a numerical value. Quantum Workplace uses these values to profile employees, calculate an overall score, and determine each organization’s rank. The award winners are companies that achieve the highest ratings for being a fun workplace with a collaborative culture, solid compensation and benefits, fair management practices, and other amenities. A Reflection of Selector’s Commitment to Its Employees Upon receiving the 2024 Best Places to Work in the Bay Area title, Selector Co-Founder and CEO Kannan Kothandaraman said, “This award showcases the quality and character of our employees, their dedication to each other and our company, and their steadfast commitment to delivering unparalleled solutions and customer outcomes.” Eric Moore, Selector’s Vice President of Worldwide Sales, added, “When employees recognize a company’s appreciation for them, they are motivated to perform above and beyond, to really take pride in their work. This not only drives product innovation but translates to an overall enhanced customer experience.” Survey feedback included the following employee comment, “I’ve held quite a few jobs over the years, and none have felt as engaging and exciting as my role at Selector. It’s a really great balance between startup culture, where no two days are quite the same, and a mature company with stability and support.” Another employee commented, “I appreciate the innovative culture at our company [and] our commitment to excellence. I am fortunate to work alongside talented and dedicated colleagues, who inspire me to strive [to do my best]. I appreciate the supportive atmosphere where everyone is willing to share knowledge and help each other grow.” A Respectful, Supportive, and Collaborative Workplace Selector strives to foster an environment that inspires its employees’ curiosity and creativity, and empowers them to share ideas, enabling a respectful and collaborative workplace. The company offers: Selector employees can also choose their preferred work environment—remote, hybrid, or on-site at its headquarters in Santa Clara—to best suit their lifestyle and productivity. Additionally, they can shift from one environment to another throughout the year as their lives and schedules require. Another Achievement in 2024 for Selector Selector’s flexible, collaborative environment enables its employees to do their best work. In just the first half of this year, Selector achieved the following successes: Winning this award is a further testament to the team’s incredible drive and dedication—fostered by one of the best places to work within the Bay Area.
Network performance plays a key role in service delivery, acutely impacting user experience. However, enterprise teams have long struggled with comprehensive insight into network performance and, when necessary, the ability to hold circuit vendors accountable. Fortunately, Selector’s advances in network monitoring and observability support detailed visibility into circuit performance, making it possible to establish service level agreements (SLAs), service level objectives (SLOs), and service level indicators (SLIs) for circuits. These correspond to the promises organizations make to their customers, the internal objectives that help organizations keep those promises, and the direct measurements organizations use to assess their performance. In fact, Selector can collect and analyze telemetry from your full-stack, delivering insight into SLAs, SLOs, and SLIs for network, but also infrastructure, cloud, and application as well. Network Reliability as a Discipline SLAs, SLOs, and SLIs are used extensively within the discipline of site reliability engineering. However, the network domain has been historically underserved. Innovations in tooling, philosophies, and practices oriented around reliability and performance are largely applied elsewhere. Fortunately, we can learn from these innovations to improve delivery of the network. For instance, in the infrastructure and application spaces, we have the concept of a site reliability engineer (SRE). This role has become table stakes for any enterprise looking to competently deliver a service. In contrast, network reliability engineers (NREs) remain relatively uncommon. NRE responsibilities are similar to those of SREs, but specifically adapted to measuring and stabilizing the reliability of the network to align with enterprise goals. A likely reason for the obscurity of NREs as well as the absence of impactful practices such as circuit SLAs, SLOs, and SLIs, has been insufficient data and tooling. However, the tide has turned. Selector’s unified monitoring, observability, and AIOps platform is uniquely positioned to provide teams with essential insight into network health and performance. Measuring Circuit Performance with Selector To achieve a comprehensive view of network performance, circuit performance must be taken into account. Selector helps operators assess circuit performance by providing unprecedented visibility into circuit KPIs such as latency, jitter and error rate. Clients benefit from an integrated network monitoring solution, accessible through a unified dashboard, that replaces key functionality historically addressed by multiple tools. Most crucially, Selector’s platform assists operators with defining and continuously monitoring their SLAs, SLOs, and SLIs. To measure SLA compliance, for instance, Selector first defines the SLA with respect to circuit KPIs such as throughput, latency, jitter, errors, uptime, flaps, etc. These KPIs are then combined to calculate an overall SLA score. Selector will even adjust the score to take into account any maintenance windows published by circuit providers. For instance, if a provider announces they will be down for six hours due to system maintenance, that downtime will be excluded from the SLA calculation. In the image below, a client is using a Selector dashboard to track circuit performance from several different network providers, revealing compliance across multiple kinds of links that are available in a particular location. At the top of the dashboard, each SLA card displays the aggregate performance across all the circuits offered by a given vendor. As you can see, all 100+ connections provided by Vendor-2 are summarized within the first card, on the upper left. According to this snapshot, the client can see that Vendor-2 is only meeting their contractual commitments 78.51% of the time, which depending on the agreement, may be grounds to receive credits back or trigger termination of that contract. Clients can also select a time period, such as a calendar month, over which to view SLA or SLO compliance as articulated within the commitment or agreement. The table below the cards lets customers investigate specific circuits of interest. For example, users can sort by circuit downtime or circuit availability—our tool provides a ranked list of metrics from best to worst. Clicking on the circuit ID enables users to drill down into a detailed view of the various KPIs related to a given circuit (jitter, latency, throughput etc.), allowing operators to pinpoint specific issues and determine why a given circuit might be failing its SLA. A Trusted Third-Party Assessment of Circuit Performance Selector delivers key circuit performance data directly to clients, so they no longer need to rely on reporting from vendors. In essence, Selector provides a neutral, third-party assessment of circuit performance and circuit SLA compliance. Once compliance insights are generated, Selector can consolidate them into a report and send it to internal and external stakeholders.These reports, which can be generated daily, weekly, monthly, or quarterly, facilitate collaboration and decision-making within an organization. For instance, management or procurement might be interested in learning more about which circuit vendors are performing best, so they can confidently renew those relationships. Alternatively, these teams may want to know which circuit vendors are underperforming, so they can replace them or negotiate a discount. Choose Success The scale and complexity of network infrastructure continues to grow, as well as the demands placed on it by service providers and their end-users. Operators must rise to the challenge, building and maintaining resilient network solutions that prioritize reliability, availability, and performance. Now, more than ever, these teams must select the appropriate tooling to support their efforts. With robust monitoring capabilities and an impressive suite of AI/ML-powered features, Selector helps operators not only meet today’s rigorous scalability demands, but prepare for those of tomorrow. What’s more, Selector can apply these strategies from network to application, and everything in between.
Selector has received the 2024 Data Breakthrough Award for Data Observability Innovation of the Year in the Data Management category! This accomplishment marks the second year Selector has received a Data Breakthrough Award, winning Best AIOps Platform in 2023. In this post, learn more about this award and how it reflects our unique approach to data observability. Behind the Data Breakthrough Awards The Data Breakthrough Awards span various categories, including data management and analytics, compute and infrastructure, and industry applications and leadership. Winners for the Data Breakthrough Awards are determined by conducting a thorough review, scoring, and analysis of top companies, startups, and well-winning organizations. Among this year’s winners are Pure Storage, Dremio, Alteryx, Sumo Logic, Western Digital, and Red Hat. A representative for Data Breakthrough said, “Our goal is to deliver the most comprehensive analysis of the data technology industry each year. And with over 2,250 nominations coming in from all over the globe for our 2024 program, the industry evaluation was broad and extremely competitive this year!” A Major Achievement for Selector In recognizing Selector, the Data Breakthrough Awards placed our platform among the best data companies, products, and services around the world that have “broken through” the crowded data technology market in 2024. Selector’s industry-leading technology simplifies today’s complex and sophisticated IT landscape by merging monitoring, observability, and AIOps into a single platform. At its foundation, Selector leverages advanced artificial intelligence (AI) and machine learning (ML) techniques to drive transformative features, including anomaly detection, event correlation, root cause analysis, and smart alerting. The Selector platform provides teams with a single pane of glass and key functionality historically addressed by multiple tools, enabling them to alleviate tool sprawl, enhance operational efficiency, and zero in on improving the customer experience. This recognition is among the many Selector continues to receive for our achievements in observability and AIOps.
Selector offers comprehensive monitoring, observability, and AIOps solutions for service providers and enterprises. The process begins with collecting, aggregating, and analyzing multi-domain operational data from various sources, such as SNMP, streaming telemetry, syslogs, and Kafka. Selector then applies advanced AI/ML techniques to power features such as anomaly detection, event correlation, root cause analysis (RCA), smart alerting, and a conversational GenAI-driven chat tool, Selector Copilot. We decided early on that these capabilities would help us observe our own customer deployments. In this post, we’ll explore why we chose to leverage Selector to monitor Selector’s customers. The Foundation: Kubernetes Depending on the nature of an organization’s infrastructure, customers elect to deploy Selector on-premises (on-prem) or in the cloud. To address this, the founding engineers at Selector leveraged a Kubernetes native stack. This approach enables our team to deploy the same software on-prem and in the cloud. While a Kubernetes-based microservices architecture enables a uniform deployment experience, it does introduce operational complexities for Selector’s site reliability engineers (SREs). Specifically, how does a Selector SRE determine a “healthy” Selector Instance? It all comes down to monitoring a complex list of Key Performance Indicators (KPIs). Below are some critical KPIs that are indicative of an instance’s good or bad health. The challenge increases in scale and complexity due to hundreds of microservices running on an instance. The figure below shows a brief set of microservices that typically run in a production-grade deployment. Why We Chose Selector Selector could have used any 3rd party tool such as DataDog, BigPanda, Dynatrace, or Grafana. However, each tool only monitors a specific element, such as service, compute, or Kubernetes clusters. As a result, it can only provide a partial view of all building blocks together. Moreover, integrating a third-party monitoring solution for the Selector platform could invite some challenges, such as: Due to the above limitations, we used Selector’s platform to observe our customer deployments. The capabilities that we offer to our customers would be very advantageous to our SRE teams. Selector’s platform, Selector Software Monitor (S2M), ingests metrics and logs from all customer instances. We started collecting a few thousand metrics from customer instances, but today, each instance emits more than a hundred thousand metrics. After ingesting all the metrics and KPIs, S2M does event correlation and deduplication, providing meaningful insights. If an issue is detected, the platform fires an alert on collaboration channels such as Slack, Microsoft Teams, etc. It enables our SREs to perform quick RCA of an issue, thus reducing the mean time needed to detect (MTTD) and repair (MTTR). An Example Workflow The platform automatically detects and correlates abnormal issues in real-time. When the platform identifies a new problem, a correlation alert gets published to Slack: The following standout about this alert: The SRE can investigate the issue by clicking the Selector Software Monitoring (S2M) URL. The honeycomb dashboards below show a detailed view of the Customer BG instance. The platform goes deeper and provides insight into the service/KPI spiked in utilization. For example, the persistent volume (PV) is the offending KPI in the scenario below. The SRE can then create JIRA tickets by right-clicking on the widget. The above example shows the complete automated workflow provided by the Selector platform, from alerting to identifying, analyzing, tracking, and closing an issue. Summary Selector-on-Selector is like monitoring a large-scale, highly distributed Software-as-a-Service (SaaS) application. Large applications require almost real-time observability and monitoring, which Selector’s product delivers. The platform’s ability to ingest data from any source, perform deduplication and correlation to detect anomalies, and generate alerts on any collaboration channel makes it an ideal choice for monitoring the Selector instances. This feedback mechanism empowers Selector SREs to understand customer pain points better and focus on meeting customer objectives.
Selector is excited to give a sneak peek into new features to be included in our forthcoming Spring Release. This release highlights key innovations focusing on integrated generative AI (GenAI) to enable guided troubleshooting and automated incident remediation. It also includes enhancements to several existing features, such as root cause analysis, native monitoring, and observability capabilities. Get an in-depth look at how these enhancements drive efficiency through tool consolidation and provide transformative insights into the health and performance of network and IT environments. Spring Release Highlights The latest Selector release includes: Also, in this release, watch for Selector in the Google Cloud Marketplace in Q2. It will enable frictionless software procurement, simplified vendor management, streamlined pricing, and help burn down cloud commits. Now, let’s dig deeper into each of these highlights. Selector Copilot with GenAI Offered as part of the Selector platform, Selector Copilot combines conversational AI with a natural/human language query mechanism. It enables your teams to use retrieval-augmented generation (RAG) to enrich and enhance responses with external data sources. Let’s examine an Enterprise Wi-Fi Network use case to help you understand Selector Copilot’s powerful capabilities. Use case: Enterprise Wi-Fi network In this use case, the operator uses Selector Copilot to understand the status of a Wi-Fi network for an enterprise, such as a hospital, hotel chain, retail store, or bank. The operator starts with the prompt: Why is the user experience in a Wi-Fi zone red? They can then prompt answers to more specific questions: Figure 1 shows this natural language query. Although not shown in this figure, Copilot also includes possible remediations. Monitoring and Observability With this release, Selector can not only act on top of existing tools, but can directly collect configuration, metric, event, and log telemetry with its native monitoring capabilities. Selector supports over 500 integrations across networks, infrastructures, clouds, and applications. It also integrates seamlessly with legacy monitoring and observability solutions to provide comprehensive insights into the overall IT environment. Once the data is collected, Selector leverages machine learning to detect and identify anomalies. It then suppresses the duplicate and non-actionable events and points towards the root cause of an incident, helping the team reduce mean time to detect (MTTD) and mean time to resolve (MTTR). Root Cause Analysis Causal machine learning helps customers identify related issues and events across multiple data sets to find the underlying root cause of an incident. The process follows this flow: Digital Twin Create a digital twin of your network and IT infrastructure from the Selector platform. The digital twin feature provides visualization that enables the clients to: Using Selector’s digital twin models to optimize your network/IT resources, you can improve your capital expenditures and reduce operating expenses. Transformative Insights for Network and IT Ops Tool sprawl has long plagued operations teams’ pursuit of actionable monitoring and observability. Selector’s unified monitoring and observability and AIOps platform bring disjointed multi-domain telemetry together. The net result is a single pane of glass that helps your team quickly understand what is happening across your IT environment. Innovative GenAI assists with troubleshooting, automates incident remediation, and meaningfully summarizes incidents, helping the team keep your networks and infrastructure up and running. See Selector in action for yourself. Request a demo.