ETL That HEALS

Surya Nimmagadda
May 18, 2021

Extraction, transform and load is a common procedure in AI/ML & data science pipelines that is critical to the ultimate strength and efficacy of analytics. In AIOps solutions, hierarchical data must be navigated, relevant data extracted, and real time data augmented with curated data. This leads to normalized/transformed data loaded into a knowledge graph. When this can be done with ease and performance, ETL creates a critical foundation for an AIOps solution that HEALs: Hierarchical, Extraction, Augmentation, Transformation and Loading.

Declaration

It is important that the structure of data be separated from the underlying logic to parse/understand the data. Ideally, there is one engine for ETL, and that engine does not require code changes, and code releases, everytime a new data source is added.

This is achieved through ETL declarations, for example using a language like YAML. No code changes are required to the ETL procedure, no new software releases have to be coded, tested, and installed. Instead, a simple YAML text file is created or modified.

Information Entropy

The value of a correlation solution is proportional to the information entropy. Increasing entropy is not a matter of increasing the number of bits, but increasing the relevant, non-redundant, information.

There are two major contributors to good information hygiene and entropy: Extracting only what is needed and Augmenting what is extracted with other relevant information, for example, from inventory databases.

Hierarchical

Supporting a broad range of data sources must include support for hierarchical and flat data. Hierarchy, specifically, is an inherent attribute of operational metrics, as it reflects the hierarchy of how the elements of operations are organized (sites, racks, switches, ports, cpu, memory etc). Selector AI ETL uses YAML + GraphQL to extract the metrics from this hierarchically structured data along with the right metadata that captures the hierarchy.

Extraction

Extraction is a critical aspect of data ingestion and overall data analysis. Selector AI easily adapts to many different types of data, which is critical to achieving the best possible correlations and actionable insights. The flexibility of the YAML + GraphQL declarations and extractions that do not require any code changes or new software releases to add support for new data sources. This leads to a variety of data sources being quickly usable, fast time to value, and code that is easier to maintain without suffering new bugs introduced by the processing of new data types. The use of GraphQL for data selection, rather than its normal use in client / server transfer, is a novel use of GraphQL in AIOps.

Augmentation

What the best AIOps solutions have quickly discovered is that simply processing measurements, telemetry, and similar data does not provide the context necessary to perform high-value correlations. What region is the data coming from, what equipment is involved, what information in inventory databases is relevant to correlations?

The Selector AI solution pre-integrates with out of band information databases like Netbox and .CSV files. This augments real time streaming data with essential context to deliver high fidelity correlations.

Loading

If correlation, visualization, and operator query engines load too much information, performance is degraded. Therefore, only what is relevant should be extracted. If too little information is loaded, then correlation quality is degraded. Loading the right balance of extracted and augmented information is essential to great AIOps UX.

The Selector AI data hypervisor is the core of the ETL procedure, extracting, curating and normalizing data. This relieves higher layer APIs and UX subsystems from having to parse through mountains of irrelevant information, and from having to understand different data formats. As a result API and UX performance is increased. Complexity is also reduced in the upper layers, where value added experiences are created.

Conclusion

Flexibility, performance, no code changes, and high fidelity correlations. This is achieved through extracting only what is needed from information hierarchies, augmenting with information from other databases, and correlating across multiple types & sources of information. Hierarchical, Extraction, Augmentation, transformation, & Loading: ETL that HEALs.

On this page

ETL That HEALS

Declaration

Information Entropy

Hierarchical

Extraction

Augmentation

Loading

Conclusion

More on our blog

When Dashboards Start Teaching the System: Why Selector’s Natural Language Querying Matters

A Bettter Way to Run Network Operations: How Actionable Correlation Eliminates Alert Chaos

Beyond the Dashboard: Selector’s Patented Approach to Conversational Observability