Blog

Critical Inflection for AIOps

There is now a critical inflection in the adoption of AIOps for Networks. Industry consensus, mature stacks, and flexible cloud native approaches enable long unmet needs to now be addressed. The availability of these solutions has come at a time when the velocity, volume, and variety of data has rendered old generation tooling inadequate to the task.

Cloud-Native

For some NetOps teams, the convenience of a public cloud delivered model is best. For other teams, lower costs can be achieved with an on-prem model. Having a software architecture that is the same on public, private, or hybrid-cloud is required in today’s market. A common software approach that enables this, is cloud-native microservices. AIOps solutions that are cloud-native will also align with Enterprises and SPs that are adopting that approach across their IT infrastructure.

Not only do Kubernetes-based container architectures deliver deployment flexibility, but they also deliver technical advantages, such as the ability to tailor different parts of the solution to different needs, drivers, and constraints. Considerations such as performance, scale, & extensibility, can be met by a microservices based architecture where Golang, Python, YAML, and other technologies can be used, as needed, independent of other parts of the solution. Older generation monolithic software architectures cannot.

Mature technologies & techniques

The introduction of cloud-native tooling is one reason AIOps is at a critical inflection point. Another is the maturing of the data science ecosystem. Multiple, mature, data science libraries are now available for use, written in commonly used languages such as python.

As a result of this maturity, data science technologies can be easily integrated into cloud native AIOps solutions. Machine learning techniques, based on these technologies, have also matured. As new algorithms and techniques are developed, they too will be easily adapted.

Industry Consensus

There have been many technologies competing for the position of defacto standard across IT infrastructure. A mere 2–5 years ago, it was not clear which technologies would be the winners and losers in important areas such as containers & data science. Those wars have been fought, and the winning technologies are clear: python, pandas, Kubernetes, YAML, and more. This industry-wide consensus enables rich ecosystems to evolve, further accelerating innovation, and de-risking technology choices.

Architecture Evolution

Monitoring tools have evolved with architecture over time. With the first generation of monolithic applications and fixed network design, operators used legacy tools to monitor infrastructure KPIs such as server uptime, load, and network connectivity. The monitoring process involved scanning important log messages from tools such as nmon and MRTG.

The second generation of monolithic applications involved running these applications on VMs and over virtual networks. Operators started using advanced tools such as Cacti and Nagios to monitor business KPIs, such as the number of transactions and infrastructure performance. Some monitoring tools started using dedicated appliances to stream data from their agents, which are installed along with monolithic applications.

With the emergence of micro-services-based architecture, it is resource-prohibitive to install agents, and the underlying network path is ephemeral. AIOps solutions can adapt to these new realities both in terms of approach and scale.

Conclusion

Networks can no longer effectively operate using current generation tooling. The volume, velocity, and variety of data is too large for the tooling and the manual human interventions today’s tooling requires. Flexible delivery models are needed: public cloud, private cloud, and hybrid cloud, enabling convenience and economics as appropriate. Flexible software architectures are needed, enabling modularity, using the best technology for each part of the architecture, and adopting CD/CI practices as needed to ensure feature velocity. Manually configuring thresholds, or accepting defaults that produce bad outcomes, is no longer applicable for network operations teams that are transitioning from ensuring availability to assuring performance levels required in multi-cloud and hybrid cloud environments.

Increased complexity , cloud-native adoption, technology flexibility, AI/ML maturity, and the need to quickly identify and resolve service impacting issues, are all driving the adoption of AIOps, resulting in a critical inflection point.