AI for Network Leaders — Powered by Selector

Join us in NYC on March 25th

AI for Network Leaders — Powered by Selector

Join us in NYC on March 25th

Selector AI blog

Discover how AI, automation, and observability are transforming network operations. The Selector AI Blog shares expert perspectives, technical deep dives, and real-world insights for IT, engineering, and operations leaders.

All Articles

ActualTech Media Interview with Kannan: Selector Leverages AIOps for Network and Application Observability

Silos—they’ve been the bane of IT admins for ages. And despite advances over the years, they continue to stubbornly hang around. This is a continuing problem, especially for network and application monitoring. “That’s the problem we set out to solve,” says Selector Co-Founder/CEO Kannan Kothandaraman in this episode of ActualTech Media’s Spotlight Series. Kothandaraman spoke with ATM Presenter and Analyst Scott Bekker about how his fully managed service takes the hassle out of bringing in and processing incoming data. Selector uses AI to do the labor-intensive and error-prone work of bringing in and analyzing organizational data from multiple streams, collating that data, and mining it for actionable insights, whether that’s locating a problem that’s crashed your network or finding out why an application is running so slow. Selector can do this no matter where your data is—on-premises, distributed or in the cloud. More information is available at https://www.selector.ai/

Network Health and Routing Analytics with Selector

Imagine it is 7:00 pm on a Thursday. Your customers are home, finishing dinner, and ready to stream their favorite Netflix series. Your network is enabling it, and with redundancy designs in place, it should be an easy night. Then, one of the aggregation routers in Chicago fails. Capacity-wise, your other router can handle all traffic, and BGP sessions are still up — yet the phone rings. Customers complain they can’t access Netflix. Traffic has dropped. This scenario shows why network health and routing analytics are critical: even when everything appears normal, hidden routing or state inconsistencies can disrupt service. The Hidden Challenges of Network Redundancy Redundancy isn’t just about hardware or paths. It’s also about network state: the configurations, routing data, and behaviors that ensure redundant paths actually work when needed. Without network health and routing analytics, operations engineers are forced to react to issues instead of proactively detecting conditions that can lead to service impact. Proactive Network Health and Routing Analytics with Selector At Selector, we focus on making an operations engineer’s life easier — not just helping you find the needle in the haystack, but preventing you from ever having to search for it. With Selector Analytics’ Network Health and Routing Analytics package, network operations engineers can: By deploying synthetic testing agents across your infrastructure — on-premises, in the public cloud, or embedded in routers and switches — Selector Analytics can: This is the core value of network health and routing analytics: turning reactive firefighting into proactive prevention. Enhancing Visibility with Routing Analytics Beyond health monitoring, Selector Analytics can ingest and analyze routing data using BMP (BGP Monitoring Protocol). Routing analytics also supports forensic analysis, allowing you to go back in time to: This capability accelerates troubleshooting, reduces MTTR, and strengthens proactive operations. Improving Network KPIs with Proactive Analytics Networking and IT operations will always balance prevention and reaction. Traditional KPIs like MTTD (Mean Time to Detect) and MTTR (Mean Time to Repair) focus on reactive response. But network health and routing analytics from Selector also improve MTBF (Mean Time Between Failures) by:

The Start of A New “Rule-Free” Era for Network and IT Operations

No, I’m not advocating that we jump into a new society where there are no rules; that would be catastrophic. We know what “rules” are in the network and IT operations. They are all those things we need to program to define the operational boundaries of our infrastructure. We use rules to set acceptable or non-acceptable thresholds on metrics, parse log messages, extract information from them, and do proper classification. In addition, we use rules to determine if configurations are right or wrong and define correlation conditions. Rules are everywhere. Ultimately, rules have been our only means, so far, to capture operational knowledge and put such ability to work. There is nothing inherently wrong with rules, but they come with extra baggage that makes them very difficult to maintain over time: The time and effort it takes to create them in the first place. The time and effort it takes to understand existing rules created by someone else one year ago. The time and effort it takes to update them. The incredible complex rules for the multi-cloud and highly dynamic micro-services IT operations. Sometimes rules are created based on a particular assumption that we know what a good or bad state is. Frequently, what is good or bad may be so dynamic and contextual that what the rule is capturing is inadequate, leading to many false positives and alert fatigue. Very frequently, we waste a lot of time reverse-engineering the goal of a particular rule because no one has adequately documented these rules, and it is unclear what that rule is doing and what the impact would be if it is changed or removed. Snowflakes start to increase, and none wants to touch them. How often does a regex rule to process a particular type of log get outdated because the log format changed slightly without prior notice from the vendor? We are wasting a lot of time statically setting operational boundaries that we can set dynamically using a different approach. There is a lot of operational knowledge embedded in our infrastructure telemetry. We need to activate techniques that allow us to extract and put it to work. We do not need rules to define static thresholds that differentiate good from the bad. We can infer from the data itself what is expected and what is not in a specific context, given the hour of the day, or the day of the week, or the month of the year, and automatically determine the proper thresholds. There is no need to waste our time on this. We do not need rules anymore to parse log messages. Natural Language processing techniques allow us to extract critical information from the log, irrespectively of whether the syntax of the log may change over time. Say goodbye to regex rules. We do not need rules anymore to classify messages or alerts. Deep Learning techniques allow us to understand the scope of a message and classify it accordingly. Say goodbye to regex rules as well. We do not need rules anymore to correlate events. If-this-then-that correlation rules belong to the past. Multi-Cloud infrastructure is multidimensional, and so must be the correlation. Context-based correlation allows us to retire legacy correlation techniques that are fundamentally linear and not designed to cope with multidimensional and multilayer infrastructures. And there is more; we do not need rules anymore to evaluate if a configuration change is right or wrong. We can use deep learning techniques to automatically understand what is right or wrong in a network or IT infrastructure and raise alerts when a configuration change does not fit. What I’m describing here is not only possible; it is real, enabled by AI/ML techniques. It is time for a new “rule-free” era for network and IT operations. Do not waste your time defining rules or reverse-engineering someone else’s rules. The future is about you investing your time in how to optimize, evolve and scale your infrastructure to deliver better services, not defining rules to set operational boundaries.

Atlantic Bridge + Selector: Advancing Network Operations Intelligence

Network operations teams today face unprecedented challenges. As enterprises expand their digital transformation initiatives, migrate to cloud environments, and adopt an ever-increasing number of enterprise applications, the complexity of IT and network infrastructure grows exponentially. These teams are now required to: While artificial intelligence (AI) and machine learning (ML) offer the potential to simplify operations, few enterprises have the in-house expertise to implement and operationalize advanced AI-driven observability and automation at scale. This is where network operations intelligence becomes essential, helping teams unify data, identify anomalies, and act before problems impact customers. Selector: Bringing Network Operations Intelligence to Enterprises Selector, a Silicon Valley-based startup, was founded to address these modern network operations challenges. The company’s mission is to empower network, cloud, and application operators with network operations intelligence by leveraging AI and ML in a practical, scalable way. The company’s flagship offering, Selector Analytics, is a network-aware operations intelligence platform that: Even as an early-stage startup, Selector has demonstrated strong customer engagement and has built mature customer relationships across key verticals, showing clear demand for AI-driven network operations intelligence. A Founding Team with Deep Industry Expertise Selector was co‑founded by Kannan Kothandaraman and Nitin Kumar, both of whom bring a unique mix of networking, application, and data science experience. This experience has allowed Selector to create a network-aware operations intelligence platform that delivers on the promise of AIOps more effectively than traditional solutions. Why Atlantic Bridge Invested Atlantic Bridge’s investment in Selector is driven by the growing need for network operations intelligence and the company’s strong positioning to meet it. Key reasons for the investment include: Atlantic Bridge expects this investment to accelerate Selector’s growth, enabling: Driving the future of Network Operations Intelligence With this investment, Selector is positioned to become a leading provider of network-aware operations intelligence. By combining AI-driven observability, correlation, and automation, the platform addresses one of the most significant pain points in modern IT and network operations: managing complexity while improving reliability and performance. Atlantic Bridge anticipates a significant market impact as Selector continues to execute on its vision and deliver innovative solutions for proactive operations intelligence. To learn more about Atlantic Bridge, visit https://abven.com/

Why SineWave Invested in Selector: Advancing Netowork-Aware AIOps

Multi-cloud is no longer just a buzzword—it’s the reality of modern IT and network operations. Public, private, and hybrid clouds now coexist to meet enterprise needs, but with this flexibility comes complexity. Downtime is extremely costly. According to industry research, 91% of organizations report that an hour of downtime costs an average of $300,000. In some cases, outages can also lead to legal exposure and reputational damage. As enterprise infrastructure continues to expand, actionable visibility and network-aware AIOps are no longer optional—they are essential. Selector: Delivering Network-Aware AIOps Selector, an early-stage Silicon Valley startup, is solving the operational complexity of multi-cloud with AI/ML-powered network-aware AIOps. The company’s Selector Analytics platform offers: With network-aware AIOps, engineers can: This approach removes the traditional barriers to analytics, allowing organizations to move faster and reduce operational risk. A Leadership Team Built for AIOps Innovation Selector’s leadership team is a key reason SineWave invested. Selector’s platform is already proven with high-caliber customers and is gaining strong traction across North America. Why SineWave Invested in Selector SineWave’s Series A investment in Selector reflects a strong belief in the company’s vision and market opportunity. The decision was driven by: Selector will use its Series A funding to: Transforming the Future of Network-Aware AIOps SineWave invests in innovative technologies that simplify complexity and maximize the value of enterprise data. Selector aligns perfectly with this vision by: With its technology, team, and market opportunity, Selector is positioned to transform enterprise AIOps and help organizations achieve proactive, data-driven operations. To learn more about SineWave, please visit https://sinewave.vc/

Eliminating Pre-Triage Chaos

Getting the whole team on the same page instantaneously Some significant resource(s) are down but the information is buried under thousands of alarm. Operation team members jump on a conference call, everyone on the call is looking at a different slice of the operations data surface, and chaos breaks out as the team tries to desperately, and as quickly as possible, workout where to even begin triage. Worse still, 10, 20, 30+ minutes may be wasted triaging the wrong issues/ratholes. Imagine if there was a tool that could correlate across all operations information, and tell all team members, immediately, where to start the triage and eliminate chaos and wasted time? Now there is. Any Data Correlation One of the reasons why so many people should jump on a conference call is because the reasons for outages and performance degradation are so varied. There is no way of knowing a priori what information is going to be the golden nugget that creates the ah ha moment for an anomaly. Tools that are designed top to bottom with inherent assumptions about a specific source and/or type of data, may be great point diagnostic tools, However, they fail to provide insight when the type of data / diagnostic they specialize in, does not provide insights about the anomaly an operations team is experiencing. A new generation of tool is now available, the Any Data AIOps tool that can ingest configurations, alarms, metrics, events, logs, inventory, and use case specific context. Operations information is correlated across previously siloed data/tools, connecting the dots, pointing operations teams in the direction of where triage should begin. This tool does not replace experienced and skilled operations teams, it gets them to the triage starting line much faster, saving tens of minutes to hours of pre-triage time across many team members, and across service downtime. A Platform and a Tool While we described the above as a new generation of tools, properly architected, they are actually powerful platforms.The power of these platforms comes from being data, cloud, and screen agnostic. The platform design does not assume any one data type, and easily processes time series data, event data, metrics, text-based data, or any other data. This enables all operations data to be centrally analyzed and correlated, connecting the dots, and allowing insights to emerge across the entire operations data surface. Being truly cloud agnostic means not relying on any proprietary cloud platform APIs or services. With this flexibility and portability, the platform can easily port to any public, private or hybrid cloud environment. Cloud agnostic is different from cloud native. Cloud native might be specific to one cloud implementation whereas cloud agnostic can execute in any cloud environment. A post-pandemic world has changed work realities. While people are returning to some workplace offices, now more than ever, it is important to have operations intelligence available to skilled and experienced employees, whether they are sitting in front of a large monitor, or working on a laptop, tablet, or smartphone, and regardless of whether they are at work, home, or mobile. Eliminating Chaos and Wasted Time Instead of 5,10,50 or more people getting on a conference call, now all team members can see analysis and correlations across the entire operations data surface, regardless of what device they are using, and see where triage needs to begin. The optimal experts can be allocated to the most productive investigation, dramatically reducing team member time, mean time to detect (MTTD), mean time to repair (MTTR), and network downtime. Team members are more productive, and network service users are less impacted. Conclusion Tools that only examine one type of data, or a limited number of data types may be good triage tools, but they cannot identify anomalies in the operations data surface that they do not have visibility to. Any data AIOps platforms that can ride the entire operations data wave can connect the dots and quickly identify where triage should begin much better than existing tools, and faster than team members doing this manually, while under the stress of network downtime and/or performance degradation. To learn more about how the Selector Platform can eliminate pre-triage chaos, visit the Selector website or request a demo today.

Become a Configuration Commando

Audit, Validation, and Anomaly Detection Selector Platform Architecture Introduction Observability tools have historically focused on logs and metrics, but not configuration information. This is surprising given how often configuration changes lead to outages as well as their potential to create security vulnerabilities. Enter the Selector Configuration Compliance solution. Operations teams can now audit configuration changes, correlate configuration changes to anomalies, and search for the presence or absence of specific configuration statements. Operations teams can be configuration commandos, with an elite understanding of configuration changes, settings, and consequences. Anomaly Detection The basic workflow is as follows: Selector users can drill down on all correlated information including diff and whole configuration Through this workflow, anomaly detection is much richer, and correlated to likely causes whether it is a fault in the network or a configuration change. Instead of operations teams having to guess at what the likely sources of anomalies are, there is greater clarity, more efficient triage, lower mean time to detect (MTTD), and lower mean time to repair (MTTR). With the impact of the configuration change identified, operations teams can roll back the change if needed, using an approved process. This solution highlights the strength of the selector architecture, capable of correlating any data source or data type, including configuration changes. Configuration Search / Validation Ever been in a situation where you are about to go live with a major new network segment and/or change, and wondered if every configuration was set the way it needed to be? Now you can do a configuration search for the absence or presence of any configuration element. Show me the devices that have this BGP statement. Show me the devices that do not have this ACL statement. All with a natural language query. A Selector customer recently used this capability before going live with a major, highly visible media event, and were very pleased they did, observing that no other solution on the market has this capability. Configuration Audit As with many Selector events, a timeline is constructed so that Selector users can explore a timeline, see correlations, and do drill downs as needed. With the Selector Configuration Change Timeline, operations teams can see when a change was made, what the change was, and who/what made the change. As automation becomes more prevalent, the frequency of changes is likely to increase. Buried in such changes may be configuration changes made by hackers that create significant security vulnerabilities. A configuration change trace / timeline is essential as configuration changes are among the most common sources of outage, performance degradation, and increasingly, security exposures. Conclusion While change management processes may provide a check on what planned changes occur, the outcome of a change is unpredictable and there may be unwanted changes occurring outside of the change management process that operations teams need to know about. Importantly, if there is a correlation between a configuration change and outages / performance degradation, operations teams need to know about that as rapidly as possible, and then return to the desired KPIs as soon as possible.

The AI in AIOps

Introduction A new generation of IT operations tools are emerging that are collectively referred to as “AIOps”. As the name implies, they are IT operations tools that leverage artificial intelligence / machine learning (AI/ML). As there are often questions about how much AI/ML these products use, it is worth exploring an example, the Selector platform. There are three big buckets of AI. Narrow AI is the class of AI that most AIOps tools are in, capable of performing specific tasks, but not human like. General AI is more like human intelligence and Super AI is intelligence that exceeds humans. While the future of General AI and Super AI is unknown, narrow AI is already making significant contributions, dramatically increasing the efficiency and effectiveness of operations teams. Supervised Learning No discussion of AI/ML can exclude a discussion of Supervised Learning, because it is perhaps the most widely discussed and known machine learning approach. Supervised learning is common in classification and regression processing where there is a desire to train a model to learn the relationship between a label (this is a cat, this is a dog,..) and structure in data. There are aspects of AIOps for which supervised learning may be well suited, for example root cause analysis on previously captured anomalies. This blog focuses on rapid anomaly detection, ranking, and contextualization from unlabeled data streaming at a rate of millions of messages per second. Rapid anomaly detection approaches that are already delivering significant value include unsupervised learning and self-supervised learning. Neither require long training times or rely on large datasets of previously labelled data. Unsupervised Learning Unsupervised learning is a way of learning structure without any hints about what that structure is, i.e., labelled data. An example is clustering like structures. This approach is extremely helpful in network health applications where the goal is to contextualize and correlate many different types and sources of data. Unlike static rules-based systems, with unsupervised learning, a dynamically assembled and filtered view of network & applications is created. Analysis adjusts to the available data, rather than using rules-based logic that works best in the presence of specific data. Anomaly sources and connections to other resources are identified. Correlation also occurs across time. A configuration change that occurred before the emergence of the anomaly can be correlated with the anomaly, and operations teams can quickly drill down on what the configuration change was. Other context, such as inventory data can be used to provide a richer view of anomaly sources and connections. A clear and noise-reduced story of health is created, focusing the energies, experience and skills of operations teams on next steps: collaboration and action. Most importantly, this approach discovers anomalies that have never been seen or occurred before. This is critical in increasingly dynamic and complex environments. Self-Supervised Learning Self-supervised learning is an approach that is often asserted as being close to how people learn. Learning by observation. In network health applications, it is common to monitor thousands of specific measurements, and then alert based on threshold violations. Thresholds can be set manually by operations teams or default heuristics can be applied. Both approaches have challenges ranging from the time to maintain, to a flood of false alarms. A better approach is to use self-supervised learning. Observe what is “normal”, and then alert based on algorithmic deviations from a normal-based prediction. In this way, thresholds are not only dynamic, they are automated. Less noise, and less manual threshold setting by operations teams. Conclusion The human imagination, aided by science fiction fantasies created for entertainment purposes, often creates expectations that are ahead of where technology currently is. In the case of AI, that expectation is general AI and super AI. However, narrow AI is real, and applied intelligently, is delivering significant value for operations teams. The use of terms AI/ML and AIOps for anomaly detection are derived from unsupervised learning and self-supervised learning. In addition to approaches that are recognized as “AI/ML”, other algorithms are used, for example recommender systems for ranking.

Facebook Outage Emphasizes Importance of Change Observability

Introduction On October 4th, 2021, Facebook digital properties were unavailable for six hours. This culturally notable event reminds all networking professionals how often configuration changes and commands are the root cause of network downtime or degraded performance. In addition, the interrelationship of operations events, router security, and physical security is also a reminder of the increasing complexity of network operations, and the need to correlate across multiple different data sources to truly understand any significant issue. In Facebook’s own diagnosis, they said: “a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network.” Source: Facebook, “More details about the October 4 outage”, October 5th, 2021. While Facebook’s outage was caused by a command, other major outages have been caused by configuration changes. For example, the 2020 Cloudflare Outage “Today a configuration error in our backbone network caused an outage for Internet properties and Cloudflare services that lasted 27 minutes. We saw traffic drop by about 50% across our network. Because of the architecture of our backbone this outage didn’t affect the entire Cloudflare network and was localized to certain geographies.” Source: Cloudflare, “Cloudflare outage on July 17, 2020”. Configuration and Command Tracking While events such as Facebook’s outage are the stuff of lead stories by news outlets, not all changes are so visible. Some changes cause subtle or limited disruption. Many of which may never be tracked and root cause analyzed like a major outage. Observing change — configuration tracking, command tracking, and anomaly correlation can expose many untracked and unnoticed anomalies. Change observability creates change timelines, correlates those changes with other operations data, and provides powerful search. Network Orchestration and Automation In a time when network automation and orchestration is becoming pervasive, frequency and volume of change is increasing, and hence a growing need to track configuration changes, commands, and current active configuration. Network operations leaders are now asking the question “what changes are the orchestration and automation systems making?”. Configuration changes and Cyber Attacks While major outages are not always caused by cyber attacks, as Facebook has stated theirs was not, it is still true that one way a malicious actor, internal or external, can disrupt an operation is through a configuration change or command. A hacker could also change configuration in such a way that worse attacks are made easier. This is another reason why configuration change and command tracking is important. Increasing complexity A configuration change or a command may not have an immediate impact on operational KPIs. The ability to correlate changes with other data is growing in importance as complexity increases. The subtle interaction across data, systems, and time, requires a new approach to anomaly detection and ranking. If two or more events are correlated in a data forest, and no one notices…then nothing can be done about it. Conclusion For multiple reasons, large visible outages as well as many undetected anomalies, and network security, configuration and command tracking is crucial. Selector is the leading supplier of usable, multivendor, network-focused AIOps including capabilities such as CMDB, configuration tracking, configuration timelines, configuration search, command tracking, and correlated anomaly detection and ranking. For more information on the Selector Change Observability, see Selector AIOps : Configuration Observability.

Usable AIOps

Both AI and AIOps are terms that have experienced inflated expectations and market disappointment. Part of the problem is these terms cover a broad range of technologies from insight engines to AI general intelligence, as illustrated by the Gartner 2020 Hype Cycle. What any specific vendor may mean by these terms varies, and these technologies will mature at different points in time. The other challenge is taking technology created for developers and delivering it as operations usable solutions. Selector’s company vision is IT solutions that are usable: productive and compelling across all aspects of the solution. Productive is the ability to complete a desired task quickly and efficiently. Compelling refers to significant impact. Focusing on these qualities delivers the promise of AI/AIOps and crosses the chasm from early adopters to mainstream adoption. This blog explores the ways Selector’s solution, Selector AI, enables compelling productivity through operations analytics, and the spectrum of AI/ML/Data science technologies utilized. Correlation — Automatically Connecting the Dots The history of IT tools is one of each vendor providing its own alert monitoring dashboard. This has resulted in two problems. A proliferation of monitors/dashboards, and siloed data. Some solutions have put significant work into consolidating alerts from different sources, but have provided limited value-add due to a large number of hierarchical dashboards that are difficult to navigate, and static in nature. Many operations tools focus on one specific type of data, or are just monitors not doing significant analysis at all. The Selector vision is that IT teams should NOT be running between different monitors to try and correlate different lenses on an issue, or slowly navigate complex dashboard hierarchies. Instead, data from different sources should be correlated to quickly and efficiently focus IT experts in the optimal area for their experience driven deep-dive analysis. Selector AI connects the dots across a growing number of data sources: telemetry, logs, configuration changes, synthetic testing, meta data, and inventory. This is not just collecting and displaying. This is analyzing to connect the dots, dramatically reducing the amount of time operations teams take in deciding where to focus their experience, skills, and energies. Automation — Eliminating Solution Administration Automation of core IT assets is a significant trend. Little will be gained in automating the core assets, if operations tools require significant additional manual overhead. Selector AI is automated by design. One example is alarm threshold setting. Instead of IT teams having to set thousands of thresholds manually, or relying on inaccurate heuristics, Selector AI auto baselines and dynamically sets thresholds using self-supervised machine learning. Another example is automated synthetic testing. Most importantly, the entire analytics workflow is automated. Data is automatically ingested, analyzed, and available for visualization, query, northbound interfaces, and automation playbooks. While Selector AI does provide the ability for data science experts to adjust workflows, Selector AI’s default approach makes a spectrum of data science processes and technologies usable by mainstream IT adopters. Dynamic ETL — Extensible Data Sources Many data science pipelines begin by transforming different data types and structures to a common structure that is optimized for query and analysis. This simplifies the analytics logic by eliminating the need for analytics logic to know the details of different data sources — different data sources can be added without changing the analytics code. Selector’s focus on customer productivity led to a YAML-based approach to defining data schemas. As a result, the Selector solution not only comes with pre-integrated data sources, it is quick and simple to add new data sources without changing, recompiling, or reinstalling the solution. Extensible without impacting the availability of the solution. Productive and compelling. Query Driven Visualization — Eliminating Complex Dashboard Hierarchies Solution suppliers invest significantly in anticipating what dashboards operations teams will need. These efforts are well-intended, but ultimately, only operations teams know what they will need for any given anomaly. That is why Selector has enabled users to dynamically create, as needed, visualizations and dashboards from queries honed to the specific anomaly they are dealing with. This gives users the information they need, without having to navigate complex hierarchies. Natural Language Queries — Human Usable Query Natural language processing is an important aspect of AI that has experienced significant progress over the last decade. Selector believes this is the best way forward for query interfaces, replacing complex, static, and fragile command line interface code. Instead of asking users to adapt their way of thinking to unnatural computer query languages, Selector believes query languages should adapt to the way humans already think. Not only does this hide the complexity of the underlying data store, it makes powerful query capabilities usable, with a significant reduction in or elimination of training. Immersive Collaboration — Powerful Capabilities Where Teams Already Work While there is a role for expert portals, solutions that only provide yet another additional expert portal are continually taking teams out of the environment they already work in, collaboration tools, and increasing the difficulty of sharing operations analytics. Both of these complicate the operational environment and increase inefficiency. Some solutions enable alerts to be sent to a collaboration environment. Selector AI goes far beyond that, enabling complex queries to be executed within collaboration environments, including the dynamic generation of visualizations. The generated content is easily, efficiently, and naturally shared with other team members. As collaboration tools already operate on many different screens: laptops, monitors, and smartphones, so too do Selector AI capabilities. Selector AI operations analytics can be dynamically generated and shared from any screen, anywhere, anytime — increasingly important in a world of work from home and work from anywhere. The Selector AI operations analytics is productive and compelling — usable in the environment that teams are increasingly turning to. AI/ML Observability — Seeing the Unseen Selector believes that there is a new generation of data science-based approaches that are going to have an increasing impact on the productivity and effectiveness of operations teams. Not because they are going to replace operations teams, but because they allow these teams to see that

AIOps for Experts

Introduction In the last blog, Collaboration-Centric AIOps, we discussed how deep integration into common collaboration tools combined with human operations experience, leads to rapid root cause identification and resolution. In this blog, we explore some of the advanced capabilities available to operations experts that allow them to do problem-specific analysis and take problem-specific action. Natural language queries AIOps has to be not just powerful, it has to be usable. The Selector AI analytics engine consists of a relational database, a time series database, a knowledge graph, and other important information repositories. The solution would not be usable if SRE/NetOps teams had to handcraft complex queries for each type of repository. Even the need for complex SQL queries would by itself make the solution cumbersome and slow to use. Selector AI makes it easy to extract data and create visualizations using natural language queries. As Figure 1 shows, significant complexity is hidden behind easy to use, intuitive, natural language queries. These natural language queries can be used to generate on-demand visualizations within collaboration tools and to create widgets for the expert portal. Visualizations look exactly the same in both the collaboration environments and the Selector AI portal, which is itself a differentiating aspect of the Selector AI solution. An important aspect of Selector AI’s natural language queries is its dynamism. As more data is digested, the vocabulary of the language grows. As the solution learns from meta-data that Chicago, for example, is a location, then natural language queries automatically recognize that Chicago is a location. No software update is required, no complex query syntax to identify Chicago as a location, queries automatically learn from meta-data. The knowledge and power of the natural language query is impressive out of the box, but also keeps evolving over time. Natural language queries can be used directly within popular collaboration tools, rendering visualizations that are a) powerful and b) exactly the same as visualizations in the expert portal. Seamless analysis across collaboration tools and expert tools; seamless analysis across operations teams. Gone with archaic hard-coded CLIs. Powerful, intuitive, and dynamic. The next generation approach to extract specific analysis, for specific problems. Expert Portal The proliferation of point tools, for point problems, has led to point portals. Each solution has its own pane of glass. Not just an additional pane of glass, but as a large-screen pane of glass as the only significant interface to the analytics. In the Selector AI solution, operations teams get the same rendering whether they are using a smartphone, a desktop, or a laptop. These screens are sufficient to collaborate for many use cases, regardless of location or device. While an expert portal is the only interface for many tools, it is an option for the Selector AI solution. The option exists primarily because collaboration tools do not support all capabilities, and we know there is additional value we can provide when SRE/NetOps teams are doing a deep dive. The Selector AI Expert portal allows customers to drill down to raw data, see many different data types on a single timeline, and initiate resolution actions. Invoking the expert portal is easily done within collaboration tools. The transition from immersive collaboration to the expert tool is seamless with a contextual hyperlink. This hyperlink can also be used to enrich ticketing systems. Conclusion There is powerful functionality directly integrated into tools like Slack and Microsoft Teams to achieve Selector AI immersive collaboration. There are also times when operations experts need to leverage capabilities not enabled in collaboration tools, and with Selector AI, that transition is seamless; called directly from collaboration tools and rendered the same in both environments. For SRE/NetOps experts, natural language queries are simple but powerful — AI is now usable and understandable. When a resolution is decided, action can be taken directly from Selector AI. For all team members, regardless of their skills, experience, and role, there is a Selector UX fit for the job. AI/ML for operations novices and AI/ML for operations experts.

AIOps: The Next Generation of Data Whisperers

Shoutout to Surya Nimmagadda and Alex Lau for their collaboration on this blog. Next Generation AIOps Two simultaneous and dramatic shifts combine in next generation AIOps: ease of use and data-centric learning: This blog focuses on data-centric learning while a future blog will focus on collaboration. The Selector AI AI/ML analytics engine has four foundational differentiators: Zero touch operation Early attempts at applying AI/ML to network operations were hampered by complex, time consuming setup and maintenance. At best, this resulted in a lengthy period of time before any value was realized. Often the result was worse, customers giving up on a tool. The promise of AI/ML was not realized. Selector AI is a next generation, operations-centric, AIOps solution, designed specifically to address issues in previous generations. The power of AI/ML in a fully automated, no-config solution, with powerful collaboration integration. Some network operations teams have a large number of people and the ability to invest heavily in new technologies. Most operations teams do not have the necessary AI/ML investment capacity or skillset. For these teams, a simple to use solution with common integrations is essential. Network operations teams need not be exposed to the intricacies of data ingestion, data normalization, AI/ML workflow construction, AI/ML algorithms, or data query. Operation must be simple, intuitive, and powerful. Any Data Analysis Many tools are focused on numerical data, for example metric measurement. Sometimes tools are also focused on a specific aspect of networking, for example traffic analysis. While these approaches have had their uses, they fail to leverage all the data that is available, to create holistic, easy to understand, and actionable insights. One characteristic of a next generation AIOps solution is the ability to rapidly identify root cause across multiple data types. For example, Selector AI already learns from logs, events, configuration and more. Not just ingesting data and reformatting it for visual display, but normalizing, augmenting, correlating, clustering, and filtering. Root cause ranked and easy to see because the noise is filtered out. Goodbye monitoring, hello observability and actionable insight. Dynamic thresholding Thresholds are among the biggest problems for network operations teams. Operations teams are faced with two difficult alternatives: Accept ineffective heuristics for thresholds and be overwhelmed with false alerts or hand-craft thresholds for an exploding number of endpoints. There is an important third alternative delivered by Selector AI: dynamic thresholding. Instead of configuring and constantly tuning millions of thresholds to the precise level required to be effective, dynamic thresholding learns normal. Anomalies then become statistical variants from an established baseline. Importantly, Selector AI can scale this approach to many millions of thresholds, a necessity in today’s virtual, overlay, and IoT networks. Root Cause Ranking Selector AI’s automated ranking algorithm rapidly draws operations attention to which resources are most likely diverging from normal and are most often indicated as being a participant in a network issue. Both of these approaches automate the sifting of enormous amounts of data in addition to connecting the dots within and across data types. The Selector AI expert analysis tool also provides timelines and analysis of how different data types are associated, for example, where a configuration change is on a timeline, and whether problems started to arise as a result of that change. Conclusion Next generation AIOps from Selector AI, is fundamentally about a paradigm shifting change in experience and analytics. One critical piece of experience is zero touch operations: end to end automation of data collection, analysis, and root cause ranking. The core modeling is consistent regardless of the data type, so only data declarations need to change, and only those declarations need to occur if the data source is not already pre-integrated. No workflow construction for the solution is required. No normalization, filtering, correlation, or clustering code has to be written. Simple, automated, and insightful. A new generation of analytics without the learning curve. You can still be a data scientist, but you do not have to be. Let Selector AI be your data whisperer and dramatically reduce the overhead of setting up thresholds, ingesting data, connecting the dots, and ranking root causes.

This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.