Introduction

As AI evolves into specialist agent networks, securely routing queries while filtering irrelevant or malicious inputs is a critical challenge. In our collaboration on the paper “Guarded Query Routing for Large Language Models” we introduce GQR-Bench, a benchmark for domain-specific routing in law, finance and healthcare as well as out-of-distribution testing and demonstrate that simple local classifiers rival cloud LLMs in accuracy at much lower latency. NetFire’s Pydantic AI framework and hardware partnerships now apply these insights to deliver secure and efficient on premise deployments.

Download Paper

From Monolithic AI to Multi-Agent Intelligence

The era of rigid, one-size-fits-all machine intelligence is evolving. While large, broad purpose architectures have demonstrated impressive capabilities, the frontier of artificial intelligence is moving towards a more specialized and collaborative paradigm. The future is not a single, all-knowing oracle but a dynamic ecosystem of AI agents, each an expert in its domain, working in concert. Imagine a team of AIs, one for financial analysis, another for medical diagnostics, and a third for creative writing, all seamlessly orchestrated. This approach promises greater power, efficiency, and nuance than any single model could achieve.

The Challenge of Secure Query Routing

This multi-agent future, however, introduces a critical new challenge: how do you manage the traffic? When a request arrives, how do you ensure it reaches the right specialized agent safely and reliably? This problem, which we refer to as secure query routing, involves more than simple topic classification. A robust dispatcher must not only pinpoint the correct “in-distribution” domain for a query (like law or finance) but also correctly handle “out-of-distribution” (OOD) queries. These OOD inquiries can range from harmless off-topic prompts to requests in unsupported languages or even malicious inputs designed to compromise the system.

Introducing GQR-Bench: A Benchmark for Routing Robustness

To address this, a recent paper we collaborated on, “Guarded Query Routing for Large Language Models,” set out to systematically evaluate the best tools for the purpose. To do this fairly, the research first introduced a new evaluation framework, the Guarded Query Routing Benchmark (GQR-Bench). GQR-Bench includes in-domain datasets for law, finance, and healthcare, plus seven out-of-distribution collections to test for robustness against everything from toxic content to simple off-topic chatter. Using this benchmark, the paper contrasted a wide spectrum of models; from powerful LLMs like GPT-4o-mini (though we at NetFire routinely leverage the very latest, such as GPT-4.1 and Gemini 2.5 Flash in our cloud deployments) to more traditional machine-learning classifiers like SVM and XGBoost; emphasizing that the goal was not to chase bleeding-edge API endpoints, but to see how on-prem and local models stack up against the commonly used hosted services.

Key Findings: Balancing Accuracy, Speed, and Cost

The findings highlight a crucial trade-off between accuracy and throughput. To measure performance, the paper used the GQR-Score, a harmonic mean of in-domain (ID) and out-of-distribution (OOD) accuracy. This score penalizes models that excel at one task but stumble at the other. While large language models like Llama-3.1-8B achieved the highest GQR-Score of about 91%, they did so with high latency, taking over 62 milliseconds per query on local hardware. The most compelling results came from a simpler approach: WideMLP, a continuous bag-of-words classifier. By applying a confidence threshold to reject OOD queries, WideMLP achieved a GQR-Score of 88%, just three points shy of the top LLM, yet with a latency under 4 milliseconds. This represents 95% of the LLM’s performance at a tiny fraction of the computational cost. For applications prioritizing raw speed, fastText delivered results in under 1 millisecond with a respectable 80% GQR-Score.

NetFire’s Philosophy: Lean, Secure, and Powerful AI

This research and its practical implications are central to our philosophy at NetFire. The findings endorse our approach to building intelligent systems that are not only powerful but also lean and secure by design. We are integrating these principles into the core of our new custom multi-agentic framework. Built on Pydantic AI, this framework is engineered to create the kind of versatile and secure multi-agent scenarios explored in the paper, enabling our clients to deploy sophisticated AI ecosystems with confidence.

Building the Future: Multi-Agent Frameworks and Infrastructure

By combining these routing advances with our end-to-end AI infrastructure services, from hardware design to framework integration, we help organizations orchestrate expert models reliably and securely. Through partnerships with Supermicro, NVIDIA, and AMD, and our deep expertise in AI-ready architectures, NetFire empowers you to build and operate tailored multi-agent solutions that meet your performance, security, and compliance needs.

How to learn more or get in touch

Visit our Resources section to get the latest NetFire product news, company events, research papers, and more.
Explore our Support Center for overviews and guides on how to use NetFire products and services.
For partnerships, co-marketing, or general media inquiries, email press@netfire.com.
For all sales inquiries, email sales@netfire.com to get setup with an account manager.

Unifi Partnership Brings Flexible Networking to the NetFire Tech Market

Build and Learn

What Today’s AWS Outage Teaches About Cloud Resilience

Find help fast with guides and
resources, on our
support center

Join the NetFire newsletter

Get our latest announcements, industry insights, product news,
and much more. It’s free to join.

COMING SOON

Top reasons to subscribe

Expert tips on tech and security best practices
Early access to cutting edge AI and data science research
Discover real-world use cases and customer success stories
Special offers and insider perks

Guarded Query Routing for Large Language Models