Getting the whole team on the same page instantaneously
Some significant resource(s) are down but the information is buried under thousands of alarm. Operation team members jump on a conference call, everyone on the call is looking at a different slice of the operations data surface, and chaos breaks out as the team tries to desperately, and as quickly as possible, workout where to even begin triage. Worse still, 10, 20, 30+ minutes may be wasted triaging the wrong issues/ratholes. Imagine if there was a tool that could correlate across all operations information, and tell all team members, immediately, where to start the triage and eliminate chaos and wasted time? Now there is.
Any Data Correlation
One of the reasons why so many people should jump on a conference call is because the reasons for outages and performance degradation are so varied. There is no way of knowing a priori what information is going to be the golden nugget that creates the ah ha moment for an anomaly.
Tools that are designed top to bottom with inherent assumptions about a specific source and/or type of data, may be great point diagnostic tools, However, they fail to provide insight when the type of data / diagnostic they specialize in, does not provide insights about the anomaly an operations team is experiencing.
A new generation of tool is now available, the Any Data AIOps tool that can ingest configurations, alarms, metrics, events, logs, inventory, and use case specific context. Operations information is correlated across previously siloed data/tools, connecting the dots, pointing operations teams in the direction of where triage should begin. This tool does not replace experienced and skilled operations teams, it gets them to the triage starting line much faster, saving tens of minutes to hours of pre-triage time across many team members, and across service downtime.
A Platform and a Tool
While we described the above as a new generation of tools, properly architected, they are actually powerful platforms.The power of these platforms comes from being data, cloud, and screen agnostic.
The platform design does not assume any one data type, and easily processes time series data, event data, metrics, text-based data, or any other data. This enables all operations data to be centrally analyzed and correlated, connecting the dots, and allowing insights to emerge across the entire operations data surface.
Being truly cloud agnostic means not relying on any proprietary cloud platform APIs or services. With this flexibility and portability, the platform can easily port to any public, private or hybrid cloud environment. Cloud agnostic is different from cloud native. Cloud native might be specific to one cloud implementation whereas cloud agnostic can execute in any cloud environment.
A post-pandemic world has changed work realities. While people are returning to some workplace offices, now more than ever, it is important to have operations intelligence available to skilled and experienced employees, whether they are sitting in front of a large monitor, or working on a laptop, tablet, or smartphone, and regardless of whether they are at work, home, or mobile.
Eliminating Chaos and Wasted Time
Instead of 5,10,50 or more people getting on a conference call, now all team members can see analysis and correlations across the entire operations data surface, regardless of what device they are using, and see where triage needs to begin. The optimal experts can be allocated to the most productive investigation, dramatically reducing team member time, mean time to detect (MTTD), mean time to repair (MTTR), and network downtime. Team members are more productive, and network service users are less impacted.
Tools that only examine one type of data, or a limited number of data types may be good triage tools, but they cannot identify anomalies in the operations data surface that they do not have visibility to. Any data AIOps platforms that can ride the entire operations data wave can connect the dots and quickly identify where triage should begin much better than existing tools, and faster than team members doing this manually, while under the stress of network downtime and/or performance degradation.