In this clip from our Networking Field Day 30 presentation, Nitin Kumar, Selector Co-founder and CTO, dives into how Selector converts metrics and logs to the common currency of events.
Video length: 4:53
Speaker: Nitin Kumar, Co-founder and CTO, Selector
Networking Field Day 30 Selector AI
A Deep Dive into Selector AI
January 20, 2023
Now let’s get to the more interesting part—converting metrics and how you correlate them. The first basic problem is metrics are fundamentally numbers, just numbers—billions and billions of numbers. Logs are English sentences.
If you want to correlate them, together, they’re apples and oranges. You cannot correlate them. Metric correlation itself is a hard problem, granted, but now you’re looking at metrics and logs that are fundamentally different classes of information. Some technology has to be built to unify them. Some common currency has to be built so that you can say: ah, now I can compare these things.
Let’s see how we do the metrics and log stuff. The metrics say … so this is the metrics pipeline. How do we go from numbers into a color? On the left, you have these numbers. This chart I’m showing is a latency chart as a data set. Latency is from point X to point Y. These latencies have different numbers. Then, we have to convert them into good numbers or bad numbers.
For example, here, a number like 174 is still considered green, good. A number like 17, which is smaller, is still red. This is where the ML (machine learning) comes into play. It says that 174 is normal for this connection. However, 17 is not normal for this connection. Fifty-six is getting closer to being abnormal. We generally have three colors: red, yellow, and orange … although orange and red are considered the same category.
How does this baselining thing work? There are two aspects of it as and when the data comes in. You’ll always see in the machine learning space, there is the training pipeline, which is shown at the top, and then there is the inference pipeline shown at the bottom. The data gets fed into the training pipeline as well as the inference path. The inference path is kind of lagging behind the training part. The training part does its training; it builds its model and makes it available to the inference pipeline. You can think of the model as a giant lookup store. You look up into it and you get a value back, although it’s not as simplistic as that.
This model in the middle is continuously being built by the inference store. As data is coming in, this training store is looking at data samples. It is going back in time. It is looking at the current data sample and the threshold value. It does all of that, and it starts writing into the model part. Then the real-time inference part is just comparing the computed threshold with the actual value and coming up with an answer. If it doesn’t show up on this screen … if you see the thing on the right, there is a faded line that is sort of the baseline of the chart. Then there are these spikes in the middle. Those spikes in the middle get flagged as red lines in the thing.
If you look at the behavior of the underlying metric, it’s just a sea of greens, then a red sea of greens, and a red. That’s the picture that I showed initially that you had white things, that are just the numbers. Then, because of this technology that we’ve built, those numbers get converted into red elements and green elements.
Another interesting part is the seasonality implementation over here. You know that, yes, it’s going to spike up at a point in time, but that spike is normal. There is a seasonality process running that is looking at the numbers over a larger period of time. It recognizes that in the morning at 8 AM, things are going to spike up, so it then artificially pulls up the threshold, like don’t flag this as a red because I’ve seen this happen.
We kind of look at seasonality … if you look at the human brain, there is a system one behavior, where you look at something and you instantly react. That’s what real-time baselining is. Seasonality is system two behavior that you go back, you think about it, and that influences your behavior. That’s a system two behavior.
Now we have numbers that we converted into events. Now we have a red event and a green event. That’s how we converted numbers into this common currency of events. We have to do the same process for logs.