Selector’s VP Product Management, Deba Mohanty, and Co-founder and CTO, Nitin Kumar, explore how to enhance network observability with Selector. They explain how Selector enables users to collect data, correlate it, get the right insights, and share it with the team.
Video length: 29:56
Speakers: Deba Mohanty, VP Product Management, Selector AI; Nitin Kumar, Co-founder and CTO, Selector AI
Networking Field Day 30 Selector AI
Introduction to Selector AI
January 20, 2023
Enhance Observability and Correlation for Next-Generation Networks with Selector AI
Deba Mohanty: Our topic of today’s discussion is how do you use Selector AI to have enhanced observability into your network. Before I start, let me introduce myself. I lead the solutions team at Selector. I’ve been at Selector close to more than three years now and led multiple different roles. Prior to that, I was at AWS and at Juniper with Nitin and our other co-founder Kanan.
Today I will primarily do the introduction … introduce you to the product and the customer deployments we already have. Before I talk about the specifics of the product, how we solve the problem, let me highlight some of the key challenges that we saw over our experience and in talking to our customers.
The first thing we noticed is that whenever network operators or … just the operations team look at any kind of outages or look at any kind of issues, they have to look at multiple different kinds of sources of data to figure out what’s happening. Yes, metrics are one of the key things they look at. But for contextual information, they look at configuration events, alerts, and logs. Logs have a lot of good information that they go and browse and look at that.
What happens is that, when they’re looking at these different data sources, they’re looking at them in silos. Because there are multiple monitoring tools in the market, you can just look at metrics, just look at logs, look at events. They look at this multitude of dashboards and try to do some kind of manual analysis. Over the years of experience, they got those insights, and they try to look for those signals in different dashboards and then figure out what’s happening.
But then once they get to that point, they have to communicate that to different teams. There could be an application team; there could be a networking team; there could be an operation team. Everyone needs to know what’s happening. After that, they decide on a fix and run the playbook.
In this overall process, getting the data, doing manual analysis, and sharing it with the teams takes the most amount of time during an outage.
Fixes usually have some kind of a playbook. They’ll go to a restart, a reset, or some kind of action to get the services back, but the majority of the time is spent in this upfront work of analyzing the data.
So, what we do is that we can collect heterogeneous data, like all different sources of data listed here, not limited to this. As we are expanding, we’re getting more data ingested in the platform … do the automatic correlation of these data sources and get insights.
Once we get that insight, we can share that using collaborative platforms like Slack or Microsoft Teams. When we started the company, it was in the middle of the pandemic, everyone was remote, and all operations were happening remotely. That was the biggest pain point we saw: how do folks share the information among themselves? We integrated with Slack and Microsoft Teams where folks can ask questions in natural language, get the response … our bot is a part of that channel … and do the debugging collaboratively.
The three main pillars of our product are the collection of data, correlating the data and getting the right insights, and sharing it with the team. We’ll focus more on the correlation part. Nitin will do a demo to show how the product works, but these are the key building blocks of the product.
This is one of the examples we saw in one of the large service providers. In the middle of the night, something happens. Folks scramble to find what happened, and they figure out this is one particular dashboard I have to look at. They have like 50 odd dashboards for metrics, 20 odd dashboards for logs, and a few more for applications.
But as soon as you get to the right dashboards, then folks have to write a right kind of structured query language or a SQL—the construct—to get the data back. The folks who are in the front line, trying to figure out what’s happening, they may not know the exact SQL language to get that insight. So, they go back and forth between this multitude of dashboards and try to figure out what’s happening.
Our primary focus when we build the product is to provide curated, contextual, effective answers when someone asks questions, like what are the top issues, what are the top applications having the issues, what are the top devices having the problem, which are the top links having issues? Those are the actionable insights that we provide.
And, because we store this insight for a longer period of time, you can go back and forth. It’s like a DVR. You can go in the past—see what happened and what are the things that are correlated—and go back and forth and see what’s happening.