Do we need LLM for every question? Disengaging Discovery from Rankings in the Age of Agentic RAGs

The era of 'static' AI is coming to an end. We've all seen what happens when you ask a standard AI a complex question: it either misses the point or gets confused. Even with recent developments in AI search models are capable of performing more in-depth analysis – such as high-quality verification gpt 5.4, the core problem is not necessarily the 'brain' of the AI; This is the method of obtaining information.

New research in RLM-ON-KG (recursive language model on knowledge graph) suggests a better way forward. Instead of treating AI as a passive reader of documents, we are turning it into one autonomous navigator Which searches the data in real time.

1. Controller Continuum: Choosing Your Engine

Not every question requires a high-powered AI “navigator.” Sometimes, a simple search is enough. We've identified a spectrum of strategies that balance speed, cost and capacity:

controller	Mechanism	Cost	For the best…
vector only	Simple keyword/semantic matching	Low (~2K tokens)	Single-hop facts
GraphRAG-local	1-hop extension (neighbor)	Low (~2K tokens)	medium complexity
heuristic rlm	Rule-based “breadth-first” search	Medium (~5K tokens)	predictable connections
llm RLM	Adaptive, AI-powered navigation	High (~50K tokens)	highly scattered evidence

2. When is LLM Navigator really worth it?

Short answer: When the evidence is scattered.

If the answer to your question is hidden in a paragraph, then using a deep-dive AI navigator is like hiring a private investigator to find your car keys in your pocket: it's a waste of money. However, when the “golden evidence” is spread across 6 to 11+ different sections or documents, the AI Navigator’s win rate increases significantly. So, in short this is the suggested strategy:

Concentrating Evidence: Use sub-second, inexpensive “static” search.
Scattered Evidence: Inspire LLM Navigator to detect distant, structurally related links to which traditional search is “blind.”

3. Capacity Ladder

It turns out that navigation is a skill that coincides with the intelligence of the model. Not all AI “brains” can handle the wheel:

Pinnacle (Cloud Haiku 4.5): Currently the gold standard, showing the strongest statistical advantage in searching scattered data.
Navigator (Gemini 2.5 Flash): Capable of performing adaptive exploration and routing around “dead ends” in the data.
Flatline (Gemma 4): Small models often lose their “adaptive” edge and return to shallow, rule-like patterns.

4. Key Rule: Separate Search from Ranking

The most important lesson from this research is that you should not ask AI to do everything for you. We have found that the best results come from a division of labor:

AI explores: Let the LLM use their “intuition” to discover the broader pool of candidate information in the graph.
Mathematics decides: Once the pool is assembled, let pure vector mathematics (cosine similarity) handle the final ranking.

Trying to let the AI handle the final scoring often results in “noise” and actually degrades performance. Let LLMs explore, but let vectors decide.

5. Selective Growth: A Production Blueprint

You don't have to choose between “cheap” and “smart”. In production environment, we use selective growth: :

step 1: Run a sub-second “first pass” using GraphRAG-local. it solves it about 53% questions Correct.
step 2: Use the “Diagnostic Gate” to look for “scatter indicators” (such as a large number of entities mentioned in the query).
step 3: Expensive, high-powered trigger detailed analysis Only for the most difficult 47% questions.

6. “Hidden” benefits: KG Diagnostics

Beyond answering questions, these AI navigators act as high-intensity stress tests for your data. By seeing where the AI Navigator stops or travels, we can diagnose the health of your knowledge graph:

Coverage interval: If the AI falls back to general search, you're missing specific entities or aliases.
Connectivity failures: If the AI gets stuck in a few “hops”, your data neighborhoods become very different.
Origin errors: If the AI pulls wrong information, your “links” are less-accurate.

This makes RLM-on-KG not just a discovery tool, but an automated “clean-up crew” for enterprise data.

This blogpost is based on the research paper: “RLM-on-KG: Estimation first, LLM when necessary” By Andrea Volpini and Eli Raad.