Peaks and Pitfalls of LLM Usage in Production Systems

Joby Rudolph
July 3, 2025

In the last eighteen months, Selector has progressively integrated large language models into its operations to a greater and greater degree. We use them in both extractive and abstractive ways, seeking to both interpret natural language effectively and communicate in a way that the most seasoned network engineers might do in the real world. The evolution of the LLM space over this time period, from the models themselves to the tools and frameworks used to work with them, has progressed at a phenomenal pace. Tools that we used initially have already been deprecated or abandoned, new and better ones have sprung up, and we’ve been forced into rapid adoption and rapid migration scenarios along the way. It’s been quite a journey, and we’ve learned a tremendous amount.

Thinking back over the process of building Selector Copilot, our chat style question and answering system that allows you to “talk” to your network, we’ve picked up many valuable lessons, learning both from the pitfalls we managed to dodge and the ones we stumbled into. Copilot is the last step of a vast pipeline of data enrichment and interpretation, stacking layers of increasing insight using traditional machine learning techniques and transformation steps. Each step builds upon the last, bubbling its way up toward the final output – a concise and accurate natural language summary of the state of your network and devices at any given moment. Getting the data to the point where it can be handed off to an LLM requires careful consideration, with lots of opportunities for errors along the way. These are just a few of the most important lessons we’ve learned while integrating an LLM into the Selector app.

Selector's Patented Machine Learning Engine

1. Overestimating the LLM’s inherent understanding beyond language

Fundamentally, a large language model is good at language. It does not know your data, the particulars of your naming conventions, the topologies in which your devices are organized, or the interconnections between them. It cannot accurately determine root causes, correlate events, or explain why something might be happening in a non-generic way. Given the opportunity, it can and will confidently make up answers that sound reasonable, but are fundamentally inaccurate.

This highlights the importance of data processing using traditional machine learning and transformation methods, so that the final context presented to the LLM already contains the answers being sought. A few examples…

You cannot pass a list of events and expect it to determine the root cause…the root cause needs to be predetermined and passed to the LLM, as a definitive and deterministic fact. The root cause analysis is where the real work is done, leaving the LLM the simple job of presenting the findings as natural language. I have come to think of the LLM as just another presentation layer. In the same way that we present data as interactive widgets in our web app via a browser, or as a snapshot image using our Slack or Teams interface, we can present our data as a natural language summary.

Similarly, you cannot pass an LLM a list of device metrics and expect it to determine the status of the device. You must provide the status of the device to the LLM, determined by a rich set of underlying analysis pipelines. At Selector, we use combinations of traditional ML based baselining, statistical modeling, regression models, NER models, outlier detection models, and correlation models (among others) to make these determinations.

Even beyond that, our data passes through several final stages of transformation specific to natural language presentation. This further refines the data in a way specific to how the LLM needs and understands context data. In fact, we go so far as to generate a pseudo-natural language summary using traditional data-processing techniques that serves as a hint to the LLM as to what to report.

Needless to say, what we’ve learned is to leave nothing up for interpretation when it comes to LLM consumption. Especially in mission critical environments, it is imperative to get this part right.

2. Context is King

I remember in my elementary and high school math classes, I’d be given word problems that would include some related, but ultimately irrelevant, facts. These would serve to see if I could filter out the signal from the noise, and not get thrown off by some data points that weren’t part of the final calculations. Inevitably though, I’d get snagged by one of these superfluous nuggets and end up with the wrong answer.

If it can happen to a human, it can happen to an LLM (now that I say that, I may have to give that some additional thought). Regardless, too much and too dissimilar information can lead to some very unexpected responses.

For example, the Selector app is constantly scanning all known datapoints and events, correlating them into records that represent single anomalies. These records are labeled by device, interface, subnet, and so on, allowing us to quickly check a specific entity for issues. Initially, we attempted to scan across the entire collection looking for all anomalies related to an entity, indiscriminate of the anomaly or event type. However, this proved problematic as the LLM would incorrectly connect the dots between these often unrelated anomalies in strange and interesting ways, presenting the user with a confusing and convoluted summary. We’ve since implemented more robust filtering to ensure only genuinely related anomalies are grouped for summarization.

Of course, the best-case scenario would be striking that perfect balance of providing just enough information needed to answer a question, and nothing more. We have employed a variety of techniques toward this end, including:

Relevance filtering: using semantic similarity to include only the most relevant columns from a dataframe.
Distributions: Precalculating distribution percentages grouped by a particular field of interest. For example, rather than device and status pairings for thousands of devices, we will precalculate distributions across status, allowing the LLM to report percentage and counts for each status bucket.
Prioritization: In general, our users are most interested in knowing what is wrong, so we will prioritize records that are in violation of some predefined or autobaselined value, and push records with nominal values to the back.

On the flip side of that, leaving out relevant information can starve the LLM, giving it no choice but to hallucinate and make up an answer. Our prompts always include instructions to report that it can’t determine an answer if relevant data hasn’t been provided, but invariably we see it happen anyway.

For example, in the early days when token limits were stricter and context windows were smaller, we would indiscriminately strip all numeric values from a dataframe before passing it along to the LLM. When questions such as “What is the latency between device A and device B” were asked, the LLM wouldn’t have the information needed to report the average latency in the given time window. It did have information about whether the latency was within normal limits or was in violation, but the actual value was not present.

Today, we use a rag mechanism that finds the top n columns related to the user’s query and provides only those column values to the LLM. This has allowed us to be more precise in what is given to the LLM, without overwhelming it with superfluous information. These incremental learnings and adaptations we’ve made along the way have made a big difference in both the quality and accuracy of responses.

3. Poor or Vague Prompting

I won’t spend too much time on this one, as it’s pretty obvious. The prompts need a high level of detail. Leave no boundary undefined, provide examples, be specific, and be assertive. Our prompts are highly tuned, and we continually update them in order to address new scenarios and establish new boundaries or expectations. The end result can be quite lengthy, but gives us the kind of result that we and our customers desire. We will continue refining these and adapting these as new models with new capabilities roll out, as our data changes, and as new edge cases are discovered.

Conclusion

So there you have it…certainly not an exhaustive list of things we’ve learned, but absolutely the ones that have resulted in the biggest impact. As I look to the future of Copilot, I can’t help but feel excited about the opportunities ahead. We are looking forward to introducing numerous enhancements in the coming months, including some new and experimental services that rely heavily on the power of Large Language Models. Some of these include natural language to query translation, improved reasoning capabilities, better understanding of conversational context, and movement into the MCP and agentic landscape. These will all, no doubt, come with their own set of challenges, but ones we look forward to tackling head on.

To stay up-to-date with the latest news and blog posts from Selector, follow us on LinkedIn or X and subscribe to our YouTube channel.

More on our blog

February 13, 2026
Dallon Robinette

AIOps

The Fragmentation Tax: What Multi-Tool Incident Response is Really Costing You

Here’s a question that sounds simple but isn’t: When something breaks in your environment, how long does it take your team to agree on what they’re looking at? Not how long it takes to fix it—that’s a different problem. I mean: how long does it take for everyone on the bridge to have the same basic understanding of what’s broken, where it started, and what it’s affecting? If your answer is anything other than “pretty much immediately,” you’ve got a fragmentation problem. And chances are it’s costing you more than you think. Consider the following scenario: alarms are flooding in. Multiple servers in the data center are unreachable. Applications are throwing connection errors. The war room comes online, and everyone — NetOps, infrastructure, the application team, and systems engineering — joins. Everyone opens their tools, and what do they see? The network team sees a BGP state change. Peers went down, routes withdrew. Infrastructure sees high CPU alarms on the core router, followed by a line card reset. The server team’s looking at dozens of hosts that lost connectivity simultaneously. The application team sees cascading failures across services that depend on those servers. The NOC pulls up a configuration change that was pushed to the router forty minutes earlier. So which one caused it? The silence tells you everything you need to know. Why Everyone’s Right, and Nobody Knows Why The frustrating part about this type of scenario is that every tool is correct. The BGP flap is real; the high CPU and line card reset occurred. The servers lost connectivity, applications are failing, and a config change was deployed. But somehow, even with all this data, you still can’t see what’s actually going on. It’s not that you’re missing information; it’s that the information lives in five different places, and each place is telling you a different story. Every tool in your arsenal did its job effectively, but they aren’t talking to each other. And that leaves you — at whatever ungodly hour this is happening — tabbing between dashboards, trying to build a timeline in your head, while someone on the bridge asks if you’ve checked whether the change was actually validated in staging. The problem here is not the complexity of your systems. It’s that your understanding is in multiple different pieces. The Architecture of Confusion When five engineers look at five different dashboards and come away with five different theories about what’s broken and how to fix it, that’s not a failure of skill. It’s a failure of architecture. Most monitoring and observability platforms are built around what we consider to be a vertical data model. Data comes in and gets sorted by type. Logs go into the log pipeline, metrics go into the metrics pipeline, so on and so forth. Network events, infrastructure alerts, and application traces each get its own lane, its own schema, its own storage, and its own analytics. Most platforms can ingest multiple types of data, but each type still lives in a silo. You can set up correlations — match timestamps, trigger alerts when two things happen at once — but those correlations are brittle and predefined. They know “if X, then Y” but they don’t know the why. That’s the gap. That’s why five smart people (often times a lot more than five) on a bridge call can look at the same incident and walk away with completely different understandings of what happened. The tools aren’t designed to give you a shared view. By nature, most of your tools are designed to optimize analysis within their own domain. So when something breaks across domains —which is, let’s be honest, most of the time — you’re left stitching the story together yourself. And you have to do it manually, under pressure, while the alarms keep coming in. What Changes When Data Speaks the Same Language There’s a different way to do this. Instead of organizing data by type, you can organize it by relationship. We call this “Horizontal Data Ingestion”. Selector doesn’t care if something is a log or a metric or a BGP event or a line card reset. It’s all knowledge, and we ingest it all — network telemetry, infrastructure metrics, application logs, topology data, change records, configuration pushes, even emails if that’s important to you. Then we use patented AI and ML models to figure out how it’s all connected. We don’t ask you to tag things in advance. We don’t need you to define schemas. We don’t care if your infrastructure spans on-prem data centers, cloud, hybrid environments, or a mix of vendors that nobody planned but everyone has to live with. We just ingest it. And then we learn it. The models we use do three things: The result isn’t five dashboards with five stories (or more). It’s one operational view of what’s actually happening. When correlation stops being about matching timestamps and starts being about understanding causality, the whole game changes. You stop pointing fingers and start solving problems. Same Incident, Different Outcome Let’s go back to that scenario. Alarms flooding, servers unreachable, applications failing. But this time, there’s no war room. Instead, a Smart Alert hits Slack. One alert. Not dozens of of fragmented notifications across five different tools. The alert shows you everything: And here’s the part that actually matters: the person who gets that alert understands what happened without needing to pull everyone else into the problem. They see what broke, where it started, what it’s affecting, and what needs to happen next. If the need to escalate or create a ticket, there’s a button right there to push it to ServiceNow — with all the correlation, context, and causation already included. No dashboard archaeology. No manual timeline reconstruction. No debate about whether this is a network problem or an infrastructure problem. Just what happened, why, and how to fix it. Selector isn’t just making incident response faster. We are fundamentally changing how incident response works. Integrate First, Consolidate Later Look, we

February 6, 2026
Dallon Robinette

AIOps

Key Takeaways From the 2025 Gartner® Market Guide for Event Intelligence Solutions

The 2025 Gartner® Market Guide for Event Intelligence Solutions reflects a shift in how IT operations leaders evaluate AI-driven technologies. As AI hype gives way to more practical evaluation, we are seeing a natural departure from broad promises about AI capabilities toward clearly defined use cases and outcomes. In their research, Gartner reframes the market formerly known as “AIOps platforms” as Event Intelligence Solutions (EIS), emphasizing correlation, context, and response over generic AI claims. While Gartner examines the evolving role of event intelligence in modern IT operations, we have identified five key takeaways in the market guide. This week, we will share Selector’s perspective on how these ideas translate into real operational value. Selector is proud to have been identified as a Representative Vendor in the 2025 Gartner Market Guide for Event Intelligence Solution. You can read the full report here. 1. The Market is Resetting Expectations Around AIOps What Gartner says: “The term AIOps has been widely adopted by vendors across multiple IT operations markets, often without a clear definition of, or consensus on, what it entails. This, coupled with the associated AI hype, has led to both confusion and disillusionment among infrastructure and operations (I&O) leaders, whose expectations have not been met.” “The renaming of this market from AIOps platforms to EIS serves to direct focus to the intended domain and set of use cases. Namely, the application of AI, ML and advanced analytics to cross-domain events from monitoring and observability tools to augment, accelerate and ultimately automate response.” Selector’s perspective: From our perspective, Gartner’s reframing reflects a broader shift in how operations teams evaluate AI in practice. The challenge was never the potential of AI itself, but the lack of clarity around where and how it should be applied to deliver operational value. Selector was built with this distinction in mind. Rather than positioning AI as a standalone capability, we focus on applying intelligence to a specific operational domain: cross-domain events produced by monitoring and observability tools. The goal is not to “add AI” to operations, but to help teams augment human decision-making, accelerate response, and progressively move toward automation in areas where confidence and process maturity allow. In other words, AI in and of itself is not the end goal; rather, it is a strategic enabler of the desired outcomes. We believe this approach mirrors Gartner’s emphasis on use cases and outcomes over terminology. By focusing on event intelligence as a defined operational layer — rather than a broad, catch-all AIOps concept — Selector aims to help teams move past abstract AI promises and focus on measurable improvements in how incidents are understood and handled. 2. Event Noise is the Core Operational Bottleneck What Gartner says: “It is not unusual for larger enterprises to have portfolios of five to 50 tools for monitoring, each creating signals that must be correlated, triaged and responded to by IT operations teams.” “Often cited by I&O leaders as the key, or only, driver for EIS implementation is this ability to reduce event volumes, in extreme cases this can result in a 95%+ reduction in events that require human intervention.” Selector’s perspective: We think Gartner’s emphasis on event volume highlights a deeper operational issue: most teams are not overwhelmed because they lack alerts, but because they lack context to understand which signals matter and why. Selector approaches noise reduction as an outcome of correlation and reasoning, not as a standalone objective. By ingesting events across domains and analyzing their relationships, Selector helps teams distinguish between symptoms and underlying issues. Events that are causally related can be grouped and contextualized, allowing operators to focus on what requires attention rather than manually triaging large volumes of disconnected alerts. This approach reflects the idea that sustainable noise reduction should reduce cognitive load without obscuring important signals. Rather than simply suppressing alerts, Selector aims to help teams understand how events relate to one another, their impact, and where to begin investigating. 3. Correlation and Context Drive Faster Resolution What Gartner says: “EIS correlate, group and reduce superfluous notifications from monitoring tools, reducing unnecessary human intervention. In addition, events are enriched with additional contextual information relating to, for example, topology, services, owner or priority.” “Events are additionally enriched with contextual information such as associated impacted business services, prior incidents, change records, owner and even suggested resolver group and remediation action. This correlation and enrichment dramatically reduces the time taken to triage, prioritize, assign and ultimately resolve an event.” Selector’s perspective: The way we see it, speed in incident response comes from shared understanding, not just faster alert handling. Correlation becomes most valuable when it explains how events relate to one another across domains and what those relationships mean operationally. Selector focuses on building and reasoning over live service topology and dependencies so that events can be interpreted in context. By linking events to affected services, historical incidents, and changes, Selector helps teams move more quickly from detection to probable cause, reducing the time spent manually assembling context across tools and teams. This approach is intended to support faster alignment during incidents. When operators can see how events connect, which services are affected, and where to begin the investigation, triage and resolution become more efficient and less reliant on ad hoc communication or escalation. 4. GenAI is Useful, But Only When Grounded in Domain Data What Gartner Says: “EIS vendors have moved quickly to implement large language model (LLM)- and GenAI-based capabilities, the use cases of which are evolving at pace. Natural language summaries of ongoing issues, providing insights into their possible cause, business impact and next steps are targeted at less technical users.” “The next evolution of these capabilities promises to deliver ever more specialized and sophisticated agentic models targeting broader aspects of the event response and remediation process with expectations being set once again toward fully automated remediation.” “Aside from evaluating the accuracy and ability of GenAI to replace human toil, I&O teams are challenged by their ability to adapt their processes and roles

January 30, 2026
Dallon Robinette

AIOps

How Agentic AI is Redefining Network Operations

For much of the past decade, many of the most ambitious ideas in artificial intelligence lived primarily in research papers, labs, and long-term roadmaps. Agentic AI was no exception. The concept of AI systems capable of reasoning, planning, and acting autonomously was widely discussed but largely theoretical. But earlier this month, Gartner® released its report The Future of NetOps Is Agentic, reflecting a growing consensus that this has changed. What was once conceptual is now becoming operational. We have reached an inflection point where AI research is being translated into real-world systems, and nowhere is this more evident than in network operations. Across IT operations, and especially in NetOps, the conversation is shifting from how AI analyzes data to how AI takes action. This marks a fundamental break from decades of human-centered workflows that simply cannot scale with the speed, complexity, and interdependence of modern networks. For the first time in the history of NetOps, teams are beginning to explore an entirely new “art of the possible.” AI is no longer confined to dashboards, recommendations, or post-incident analysis. Instead, intelligent systems can continuously observe live environments, reason across domains, and act on behalf of operators in near real time. This marks a redefinition of how network operations function. This week, we are exploring what Agentic AI means for network operations, why it matters now, and what must be in place for it to succeed. Transitioning from AIOps to Agentic Operations For a number of years now, AIOps platforms (now called Event Intelligence Solutions by Gartner) have focused on applying AI to one of the hardest problems in IT operations: making sense of overwhelming volumes of events and signals. Solutions like Selector have delivered real, measurable value, reducing noise, accelerating root cause analysis, and improving mean time to resolution through event correlation and contextual enrichment. However, AIOps was never designed to deliver full autonomy. By nature, it relies on AI models for optimized pattern detection, inference, and recommendation, with humans remaining responsible for decision-making and action. The fact that AIOps stops short of full autonomy is not a shortcoming but rather a reflection of the maturity of the AI technologies and operational modes available when these platforms emerged. Agentic NetOps represents the next logical evolutionary step, made possible only now as advances in AI architectures, reasoning systems, and operational guardrails begin to close the gap between insight and action. The 2025 Gartner® Market Guide for Event Intelligence Solutions reframes this evolution by focusing on event intelligence as the foundation for automation and decision-making. According to Gartner: “Event intelligence solutions apply AI to augment, accelerate, and automate responses to signals detected from digital services.” The framing around this is critical, and our take is that AI must first understand before it can act. That understanding requires unified events, cross-domain context, and causal reasoning — all of which are capabilities that must precede any form of safe autonomy. Gartner’s 2026 research report, The Future of NetOps is Agentic, highlights this natural progression: response-focused AI (simple AI chatbots) gives way to task-focused AI (AI assistants), which finally evolves into goal-focused AI (Network AI agents). In other words, Event Intelligence (formerly known generally as AIOps) lays the foundation. Agentic AI then builds on that foundation to introduce systems to go beyond recommending actions and instead continuously reason about the environment and execute on behalf of operators. What makes AI “Agentic” in NetOps? Agentic AI differs fundamentally from chatbots or task-based assistants. Rather than responding to prompts or executing predefined workflows, agentic systems operate with: In practical terms, this means AI agents can monitor live networks, detect emerging issues, investigate root cause across domains, and initiate remediation — often faster and at greater scale than human teams. Gartner notes that generative AI is accelerating this shift by enabling natural language interaction and deeper contextual reasoning: “EIS vendors have moved quickly to implement large language model (LLM)- and GenAI-based capabilities…These capabilities will increasingly be enhanced with retrieval-augmented generation (RAG) or fine-tuning to provide improved context and reduce the risk of hallucinations and inaccurate findings.” Gartner also asserts that: “The next evolution of these capabilities promises to deliver even more specialized and sophisticated agentic models targeting broader aspects of the event response and remediation process with expectations being set once again toward fully automated remediation.” Why Agentic AI is inevitable for network operations Modern networks are no longer static infrastructures. They are dynamic systems spanning cloud, data center, edge, and SaaS, producing massive volumes of telemetry and events every second. Human-centered operations models simply cannot keep pace. Gartner highlights the operational pressure facing I&O teams: “Many IT operations teams fail to realize the full potential of event intelligence solutions, realizing a limited value beyond event correlation and noise reduction.” At Selector, we believe the next leap forward comes from closing the gap between insight and action. Agentic AI enables: In this model, humans are no longer “in the loop” for every decision, but remain firmly “on the loop”, defining intent, guardrails, and trust boundaries. The Prerequisites for Agentic NetOps Agentic AI cannot be bolted onto fragmented tooling or poor data. Gartner repeatedly emphasizes that value depends on data quality, process maturity, and organizational readiness: “The efficacy of event intelligence solutions is directly related to the sources and quality of data available for ingestion and analysis.” From our perspective, successful agentic operations require: Without these foundations, autonomy increases risk rather than reducing it. Selector’s Perspective: Agentic AI as a Capability, Not a Feature One of the biggest risks in the current market is superficial “agent washing”, where vendors rebrand chat interfaces or scripts as autonomous intelligence. Gartner warns against this hype-driven approach, noting that AI must be evaluated by its use cases and outcomes, not by terminology. Selector views Agentic AI not as a single feature, but as a capability that emerges from mature event intelligence. When AI has access to high-fidelity signals, rich context, and causal reasoning, agentic behavior becomes both possible and safe. This is why Selector has

On this page