Blog

AIOps for Networking: See the real anomalies

AIOps quickly sorts through a maze of logs, telemetry, configuration changes, and other information to pinpoint the probable cause. Other approaches leave NetOps/SRE teams drowning in a sea of false alerts, different monitors, and static visualizations.

The Problem

There is a paradigm shift in the way services are delivered — monolithic applications moving to distributed applications spread across hybrid clouds. The excellent news about operations trends is that more devices and more endpoints are being monitored. The bad news is that systems that simply monitor metrics, create an operations nightmare.

Operations endpoints are physical, virtual, and functional (microservices). The number of endpoints is expanding at a rapid rate. A screen full of monitored endpoints is not observable by people.

What about only reporting endpoints that have exceeded thresholds? In theory, this does reduce the number of alerts. In practice, most systems use static heuristics. All links should have X set as the threshold. All microservices should have Y set as the threshold. These heuristics, either vendor-supplied or manually configured by operations teams, generate false alarms. The appropriate threshold for one link, device, or microservice is not applicable for another. Worse still, operations teams have to accept the defaults set by vendors or set and maintain a rapidly growing number of thresholds.

The Selector AI Solution

Selector AI uses machine learning to set the appropriate threshold for each endpoint automatically. Endpoints do not have to be configured. Endpoints are identified in streaming data, and learning commences automatically. As thresholds are appropriately set for each endpoint, threshold violations are more realistic, the number of false alerts is dramatically reduced, and it is much easier and faster to see the real anomalies. Selector AI also identifies outliers in a distribution of similar endpoints.

Conclusion

Selector AI has proven in real production networks that AIOps machine learning dramatically reduces the time to see real anomalies by eliminating the false alerts that are generated by static thresholds that do not adjust to changing conditions, based on heuristics which are one-size fits all approaches. Selector AI eliminates the setup and maintenance of thousands of thresholds.

Selector AI also supports static & binary thresholds for those operations teams that still wish to use them in addition to Selector AI’s machine-learned baselining thresholds and visualizations.