Introduction
As AI evolves into specialist agent networks, securely routing queries while filtering irrelevant or malicious inputs is a critical challenge. In our collaboration on the paper “Guarded Query Routing for Large Language Models” we introduce GQR-Bench, a benchmark for domain-specific routing in law, finance and healthcare as well as out-of-distribution testing and demonstrate that simple local classifiers rival cloud LLMs in accuracy at much lower latency. NetFire’s Pydantic AI framework and hardware partnerships now apply these insights to deliver secure and efficient on premise deployments.
Download Paper
From Monolithic AI to Multi-Agent Intelligence
The era of rigid, one-size-fits-all machine intelligence is evolving. While large, broad purpose architectures have demonstrated impressive capabilities, the frontier of artificial intelligence is moving towards a more specialized and collaborative paradigm. The future is not a single, all-knowing oracle but a dynamic ecosystem of AI agents, each an expert in its domain, working in concert. Imagine a team of AIs, one for financial analysis, another for medical diagnostics, and a third for creative writing, all seamlessly orchestrated. This approach promises greater power, efficiency, and nuance than any single model could achieve.
The Challenge of Secure Query Routing
This multi-agent future, however, introduces a critical new challenge: how do you manage the traffic? When a request arrives, how do you ensure it reaches the right specialized agent safely and reliably? This problem, which we refer to as secure query routing, involves more than simple topic classification. A robust dispatcher must not only pinpoint the correct “in-distribution” domain for a query (like law or finance) but also correctly handle “out-of-distribution” (OOD) queries. These OOD inquiries can range from harmless off-topic prompts to requests in unsupported languages or even malicious inputs designed to compromise the system.
Introducing GQR-Bench: A Benchmark for Routing Robustness
To address this, a recent paper we collaborated on, “Guarded Query Routing for Large Language Models,” set out to systematically evaluate the best tools for the purpose. To do this fairly, the research first introduced a new evaluation framework, the Guarded Query Routing Benchmark (GQR-Bench). GQR-Bench includes in-domain datasets for law, finance, and healthcare, plus seven out-of-distribution collections to test for robustness against everything from toxic content to simple off-topic chatter. Using this benchmark, the paper contrasted a wide spectrum of models; from powerful LLMs like GPT-4o-mini (though we at NetFire routinely leverage the very latest, such as GPT-4.1 and Gemini 2.5 Flash in our cloud deployments) to more traditional machine-learning classifiers like SVM and XGBoost; emphasizing that the goal was not to chase bleeding-edge API endpoints, but to see how on-prem and local models stack up against the commonly used hosted services.
Key Findings: Balancing Accuracy, Speed, and Cost
The findings highlight a crucial trade-off between accuracy and throughput. To measure performance, the paper used the GQR-Score, a harmonic mean of in-domain (ID) and out-of-distribution (OOD) accuracy. This score penalizes models that excel at one task but stumble at the other. While large language models like Llama-3.1-8B achieved the highest GQR-Score of about 91%, they did so with high latency, taking over 62 milliseconds per query on local hardware. The most compelling results came from a simpler approach: WideMLP, a continuous bag-of-words classifier. By applying a confidence threshold to reject OOD queries, WideMLP achieved a GQR-Score of 88%, just three points shy of the top LLM, yet with a latency under 4 milliseconds. This represents 95% of the LLM’s performance at a tiny fraction of the computational cost. For applications prioritizing raw speed, fastText delivered results in under 1 millisecond with a respectable 80% GQR-Score.
NetFire’s Philosophy: Lean, Secure, and Powerful AI
This research and its practical implications are central to our philosophy at NetFire. The findings endorse our approach to building intelligent systems that are not only powerful but also lean and secure by design. We are integrating these principles into the core of our new custom multi-agentic framework. Built on Pydantic AI, this framework is engineered to create the kind of versatile and secure multi-agent scenarios explored in the paper, enabling our clients to deploy sophisticated AI ecosystems with confidence.
Building the Future: Multi-Agent Frameworks and Infrastructure
By combining these routing advances with our end-to-end AI infrastructure services, from hardware design to framework integration, we help organizations orchestrate expert models reliably and securely. Through partnerships with Supermicro, NVIDIA, and AMD, and our deep expertise in AI-ready architectures, NetFire empowers you to build and operate tailored multi-agent solutions that meet your performance, security, and compliance needs.
Authors
Richard Šléher
William Brach
Tibor Sloboda
Kristián Košťál
Lukas Galke
In collaboration with STU, Bratislava, Slovakia and Centre for Machine Learning, University of Southern Denmark.
How to learn more or get in touch
- Visit our Resources section for companies news, research and product updates.
- Check the Support Center for guides, documentation, and help desk access.
- For partnership, customer inquiries, or pre-sales questions email hello@netfire.com
- For all media inquiries, email press@netfire.com