A LeverageAI Ebook

The Cognition
Supply Chain

From Search to Compounding Agentic Cognition

Scott Farrell

LeverageAI — 2026

Part I: The Architecture Gap

The Wrong Variable

You upgraded to the frontier model. Your outputs are still generic. The model isn't the problem.

You've just upgraded to the frontier model. Spent weeks migrating. Your team is excited. The benchmarks are incredible — higher scores on every leaderboard, bigger context window, faster inference. You run your first real domain query.

The output is... still generic. Still vague. Still missing the nuances your domain demands. Still the kind of thing that sounds plausible to someone who doesn't know the subject, and frustrating to someone who does.

The natural conclusion: "We need better prompts." Or: "The model isn't smart enough yet — maybe the next release."

Both conclusions are wrong. You're optimising the wrong variable. The model isn't the bottleneck. The supply chain feeding it context is.

The Model Intelligence Myth

The default assumption driving the entire AI industry: better model = better output. Bigger context window = more knowledge. Higher benchmark scores = more capable. This is the mental model everyone runs on.

Model vendors market intelligence as THE differentiator. Upgrading feels like progress without architectural risk. Benchmarks reinforce the narrative — MMLU, HumanEval, GPQA — each new release inches the numbers upward, and the marketing machines spin it into inevitability.

But here's the uncomfortable truth: for domain-specific, unstable-domain work — the kind enterprises actually need — model capability matters far less than what the model gets to think with.¹

Compressed context works amazingly well. Frameworks accelerate you from zero to 100 in one second — the model is up to speed in no time at all. Not because the model suddenly learned your whole worldview. But because the right context architecture gave it a high-bandwidth index into everything it needed.

The model didn't get smarter. It got a map.

Blue Whales vs Implement AI

Ask any model about blue whales. You'll get brilliant, detailed, nuanced answers. Migration patterns, population dynamics, the physics of baleen feeding, conservation timelines. Excellent stuff.

Now ask it how to implement AI in your organisation.

Fog. Plausible-sounding generalities. "Consider your use case." "Start with a pilot." "Ensure executive buy-in." The kind of advice that fills whitepapers but empties bank accounts. If you ask AI how to implement AI, it won't even scratch the surface. It'll be wrong, inconsistent. But ask AI about blue whales and you've got heaps of knowledge and background.

Why the difference? Blue whales are a stable domain: well-established facts, slow-moving consensus, clear signals, bounded disagreement. A language model does well here even with stale training data, because the underlying shape of the knowledge doesn't change daily.

"Implement AI" is an unstable domain: the technology changes quarterly, best practices are still forming, failure modes are socio-technical (politics, incentives, governance), and the question is massively under-specified. What industry? What risk appetite? What data maturity? What team structure?

The prompt is effectively: "Please search an unbounded space of possibilities and also guess which constraints I forgot to mention." Any navigator would struggle without a map, regardless of intelligence.

The Blue Whales Test

Ask your AI system a domain question. Then ask it about blue whales.

If the blue whales answer is dramatically better, you don't have a model problem. You have a context architecture problem. The model can reason brilliantly — it just doesn't have a map of YOUR territory.

This diagnostic takes 60 seconds and reveals whether upgrading your model or upgrading your context architecture will produce the bigger improvement.

Frameworks as Compression Codec

What solves the unstable-domain problem? Compressed frameworks that turn messy, high-entropy problems into named shapes the model can reason with.

Once you name the shapes — lane doctrine, augmentation vs replacement, fast-slow split, governance-first, economies of specificity — you've reduced the search space brutally. The model doesn't have to "invent wisdom." It navigates a map you already drew.

"Once you name the shapes, the search space collapses brutally. The model navigates a map — it doesn't invent wisdom."

Frameworks help AI systems in three specific ways:

1. Routing

"Which lane am I in?" Augment vs automate vs reinvent. Batch vs real-time. Internal vs customer-facing. Governed vs unmanaged. The framework tells the model WHERE to focus before it starts reasoning.

2. Compression

"What do I need to remember?" A few principles instead of a thousand anecdotes. The framework reduces a complex domain to its load-bearing abstractions.

3. Consistency

"What should I do next?" Repeatable choices, not vibes. The framework ensures the model makes the same high-quality decision regardless of how the question is phrased.

This is the "zero to 100 in one second" effect. Frameworks provide a high-bandwidth index into a worldview. The model didn't learn your whole worldview in one second — the frameworks provided a compression codec that makes the worldview navigable instantly.

Context Quality vs Context Quantity

The conventional wisdom: more context = better results. Bigger context window = bigger opportunity. RAG retrieves more chunks = more knowledge.

The reality: context is not just capacity — it's attention.

"Context must be treated as a finite resource with diminishing marginal returns. Like humans, who have limited working memory capacity, LLMs have an 'attention budget' that they draw on when parsing large volumes of context."²

Every irrelevant token in the context window actively degrades the model's focus and output quality. It's not neutral noise — it's attention theft.

O(n²)

Self-attention scales quadratically. Double your context size, and you've quadrupled the computational load — and the attention diffusion.³

The practical consequence: 50K tokens of pure signal outperforms 200K tokens of noise + signal.⁴ Smaller, cleaner context beats bloated context every time.

Raw context gives you lots of text at high cost with low signal. Compressed context gives you a small number of high-leverage abstractions at low cost with high signal. The race for bigger context windows misses the point entirely. It's like making a bigger warehouse instead of improving your supply chain logistics.

The Cosmic Joke

Here's the punchline that should change how you invest: once you have the right context architecture — the thinking OS — the exact model matters less than everyone thinks.

Bigger models help, sure. But structure is how you turn capability into outcomes. A well-architected supply chain feeding a "last-generation" model will consistently outperform a frontier model running on raw context stuffing.

The evidence: frameworks compiled 6+ months ago still produce expert-level outputs from models with 18-month-stale training data. The frameworks bridge the training gap. The routing index provides vocabulary the model never saw in training. The compression patterns keep the context clean. The architecture does the heavy lifting.

"The cheeky cosmic joke: once you have the thinking OS, the exact model matters less than everyone thinks."

What this means for investment: stop chasing model upgrades. Start building framework libraries and routing indexes. The competitive moat shifts from "which model" to "which supply chain."

The industry is spending billions upgrading models while users still stuff raw text into system prompts. The 10x improvement isn't in the model — it's in the architecture around the model.

Setting Up the Architecture

If model intelligence isn't the answer, what is? Architecture. Specifically: a supply chain for cognition.

The next chapter names the architecture and its five stages. The rest of this ebook builds each stage.

Key Takeaways

• Model intelligence is overrated for domain-specific work — context quality is the binding constraint
• The Blue Whales Test diagnoses whether your problem is model capability or context architecture
• Frameworks act as compression codecs that collapse the search space for AI systems
• Attention is finite — smaller, cleaner context outperforms larger, noisier context
• The competitive moat is shifting from model choice to context architecture

Part I: The Architecture Gap

The Cognition Supply Chain

Manufacturing supply chains transformed raw materials into finished goods. Cognition needs the same architecture.

In manufacturing, nobody expects brilliant products from throwing raw materials at the factory floor and hoping. There's a supply chain: sourcing, processing, quality control, assembly, delivery. Each stage adds value. Each stage has discipline. The whole thing compounds because yesterday's production teaches you how to produce better tomorrow.

Yet that's exactly what most AI implementations do: throw raw text — system prompts, basic RAG chunks, unstructured documents — at a frontier model and wonder why the output feels like it was written by a well-read stranger who knows nothing about your business.

We call this architecture The Cognition Supply Chain — the end-to-end pipeline from raw knowledge to client-ready, domain-specific AI output.

Naming the Architecture

The vocabulary shift is deliberate. "AI recommendations" sounds like magic that should just work. "Cognition supply chain" sounds like engineering that needs discipline. It imports 20+ years of supply chain management thinking — routing, QC, compression, just-in-time delivery, feedback loops — and applies it to knowledge.

"The vocabulary shift matters. 'AI recommendations' sounds like magic. 'Cognition supply chain' sounds like engineering that needs discipline."

The Supply Chain Metaphor Mapped

Manufacturing Supply Chain

• Raw materials → sourced, graded, inventoried
• Processing → shaped, refined, quality-checked
• Assembly → combined into finished product
• Delivery → shipped to the right customer
• Feedback loop → production data improves next run

Cognition Supply Chain

• Raw materials → domain knowledge, frameworks, institutional memory
• Processing → agentic exploration, cross-document traversal
• Quality control → dual-query judgment, LLM judges
• Assembly → sub-agent compression, distillation
• Feedback loop → each cycle improves routing and frameworks

Why "supply chain" and not "pipeline"? Pipelines are linear. Supply chains have routing decisions, parallel processing, quality gates, inventory management (context), and — critically — they compound. Each delivery teaches you how to source and process better next time.

The Five Stages

Stage 1 — Route

Give the agent a map before it searches.

A routing index provides information scent — vocabulary, topology, canonical terms. Without it, the agent wanders through embedding space hoping to stumble on relevance. With it, the agent navigates directly to high-value content using terms guaranteed to exist in the corpus.

Stage 2 — Explore

Send agents to investigate, not just retrieve.

Move beyond "find me the top 5 chunks" to "explore this corpus like a human researcher." Three-phase loop: scan, deep dive, backtrack. Cross-document dependencies become first-class, not an accident of chunk recall.

Stage 3 — Judge

Separate retrieval from reasoning.

The dual-query pattern: one query for broad retrieval (high recall), another for precise judgment (high precision). An LLM judge filters, refines, or rejects candidates against the real goal. This prevents domain keywords from polluting semantic search.

Stage 4 — Compress

Distill exploration into high-density signal.

Sub-agents explore in isolated contexts (200K tokens of thrashing). The main agent reads only the compiled memo (20K tokens of pure signal). Scatter-gather for cognition: parallel exploration, centralised compilation.

Stage 5 — Compound

Each cycle improves the next.

Better frameworks lead to better routing, which leads to better exploration, which leads to better compression, which leads to improved frameworks. This is a flywheel, not a pipeline. Architecture investments compound; model upgrades don't.

Stage	Name	Job	Key Pattern
1	Route	Give the agent a map	Routing index with information scent
2	Explore	Investigate, don't just retrieve	Three-phase agentic search
3	Judge	Separate retrieval from reasoning	Dual-query pattern
4	Compress	Distill to high-density signal	Sub-agent scatter-gather
5	Compound	Each cycle improves the next	Kernel flywheel

The Retrieval Maturity Ladder

Most organisations think they're advanced with AI because they've implemented RAG. Here's where they actually sit:

Level 1: System Prompt Stuffing

Pad 2,500 tokens of context into the system prompt. Cheap, fast, but brittle and stale. No search capability. Content goes stale immediately. Where most "we're using AI" organisations actually are.

Level 2: Tool-Calling RAG

LLM with tool-calling to RAG. The "old way." Single-shot: query, top-k chunks, synthesise. No iteration, no refinement, no gap detection. The model can't reformulate queries or spot missing context.

Level 3: Agentic RAG

Agent-driven RAG. The loop matters: the agent can reformulate queries, spot gaps, ask for more. But still limited to chunk similarity — can't follow cross-document references. Where many "advanced" implementations sit today.

Level 4: Agentic Exploration with Compression

Full exploration engine: scan, deep dive, backtrack with tool access. Sub-agent compression: 200K to 20K. Dual-query judgment: separate retrieval from evaluation. The cognition supply chain lives here.

Why Architecture Compounds

Model upgrades are one-time lifts. You upgrade from Model A to Model B. Output improves marginally. Then it plateaus until the next upgrade. No compounding.

Architecture compounds. Every framework you build, every routing entry you add, every compression pattern you refine — makes the NEXT interaction better. The system learns (through the human closing the loop), not just the model. Research from Andrew Ng demonstrates that GPT-3.5 with proper agentic architecture outperforms GPT-4 alone⁶ — proving architecture matters more than raw model capability.

Your competitor can buy the same model tomorrow. They can't copy your framework library, your routing index, your compression patterns, or your institutional knowledge encoded into the supply chain.

This connects to Worldview Recursive Compression⁵: compile your domain expertise into frameworks that serve as the "kernel" loaded into every AI interaction. The supply chain IS the kernel in action.

The Industry Is Already Moving This Way

The cognition supply chain isn't theoretical. Pieces of it are already shipping in production systems:

Tavily

Search, extract, LLM answer. Two-step retrieval with judgment built in. Their /search endpoint returns concise snippets optimised for LLM ingestion; /extract pulls full cleaned content; include_answer adds LLM synthesis. This is Stages 2 + 3 commercialised.⁷

Claude Web Search

Agentic loop — Claude decides when to search, the API runs searches, and this can repeat multiple times in one request. The model is exploring, not just retrieving. Stage 2 as a platform feature.⁸

Anthropic Multi-Agent Research

Multi-agent orchestration showing 90.2% success rate vs 14–23% for single-agent approaches.⁹ Parallel sub-agents with compression. Stages 4 + 5 validated at scale.

The industry is building the supply chain piece by piece. We're naming the whole architecture.

The Map Ahead

Part II builds each stage in detail:

• Chapter 3: Routing — how to give the agent a map
• Chapter 4: Exploration — how to search properly (not just retrieve)
• Chapter 5: Compression — how to distill 200K tokens to 20K of signal
• Chapter 6: Compounding — why each cycle makes the system smarter

Part III then shows the supply chain applied to real workflows: proposals and research.

Key Takeaways

• The Cognition Supply Chain has five stages: Route → Explore → Judge → Compress → Compound
• Most organisations are at retrieval maturity Level 2, thinking they're at Level 3
• Architecture compounds; model upgrades don't
• The industry is converging on this pattern — Tavily, Claude, Anthropic research all validate pieces of it

Part II: Building the Pipeline

Give the Agent a Map

When an agent starts cold against a corpus, it has two problems: it doesn't know what words exist in your world, and it doesn't know what matters.

When an agent starts against your corpus for the first time, it faces two problems simultaneously. The vocabulary problem: it doesn't know what terms exist in your world. Your organisation has named concepts, acronyms, frameworks, and domain-specific language the model has never seen in training. The topology problem: it doesn't know what's important versus peripheral, foundational versus derivative, central versus edge-case.

Without solving both, the agent performs expensive random walks through embedding space — burning tokens, returning noise, and sometimes missing the exact content that would have answered the question perfectly.

The fix: give the agent a geography map before it starts searching.

Information Foraging Theory

Information Foraging Theory¹⁰, from cognitive science and human-computer interaction research, explains how agents — both human and AI — navigate knowledge spaces. The core concept is information scent: cues that help an agent estimate "is it worth going down this path?" before paying the cost of opening, reading, and reasoning.

Good websites beat bad ones for the same reason: people don't "search" — they forage. The better the scent (clear labels, meaningful categories), the fewer wrong turns. A well-designed navigation menu lets you estimate "will I find what I need down this path?" before clicking.

Applied to AI agents: a routing index provides information scent for the corpus. The agent can estimate value before full exploration — just like a user scanning a well-designed navigation menu.

Without information scent: The agent's only vocabulary comes from your problem statement. Those words might not match anything in your knowledge base. The agent searches with the wrong terms, retrieves irrelevant chunks, and produces generic output.

With information scent: The agent reads the map, discovers canonical terms that ARE in the corpus, and immediately searches with terms guaranteed to hit.

The Routing Index Pattern

A routing index is a machine-readable map of your knowledge corpus. It's NOT documentation. Not a README. Not a wiki. It's a control surface for agent navigation.

"Instead of starting the search empty, you're giving the LLM a geography map."

The structure of each entry:

routing_index.md — example entry

## Lane Doctrine
- Importance: Prevents high-failure AI deployments by scoring projects for structural fit
- Retrieve when: evaluating or prioritising AI initiatives, deciding between automation approaches
- Tags: deployment, project selection, batch processing, governance, risk
- Related: Simplicity Inversion, Enterprise AI Spectrum, Fast-Slow Split
- URL: /frameworks/lane-doctrine.md

The crucial insight: every term in the routing index is guaranteed to exist in the corpus. This turns semantic search from "hopefully something matches" to "I know these words will hit." The vocabulary problem and the topology problem are solved simultaneously.

A Control Surface, Not Documentation

The routing index doesn't EXPLAIN the frameworks. It provides just enough scent for the agent to decide whether to look deeper. One-liner descriptions, not full explanations. Tags for cross-referencing, not comprehensive summaries. "Retrieve when" triggers, not usage guides.

Three capabilities a routing index enables:

Guided Exploration

Start at high-level nodes, drill down only when needed. The agent doesn't read everything — it reads the map, then reads what matters.

Backtracking by Design

If the chosen branch isn't yielding evidence, the agent can deliberately shift to an adjacent framework rather than flailing in embedding space. The "Related" field makes this deliberate.

Query Expansion

The index provides canonical terms the agent can use to formulate better queries — without polluting retrieval with client-specific noise. It's a query expansion oracle with zero false matches.

There's academic work validating this approach. RAPTOR¹¹ builds tree-structured summaries for hierarchical retrieval — retrieving across levels of abstraction. A routing index is a lightweight, human-curated version of that idea: both high-level map (global context) and drill-down capability (local detail).

"Regardless of your retrieval strategy — RAG, agentic search, whatever — the routing index is super powerful."

Worked Example: From Cold Start to Targeted Search

Before and After: The Routing Index Effect

Before: Cold Start

Problem: "Our manufacturing client wants to implement AI. Where should they start?"
Agent vocabulary: "manufacturing", "AI", "implement", "start"
RAG returns: Chunks about manufacturing processes, general AI introductions, implementation timelines
Result: Generic advice that could have come from any blog post

The agent searched with the only words it had — problem terms that don't match corpus terms.

After: With Routing Index

Agent reads index. Sees: Lane Doctrine ("when evaluating where to deploy AI"), Simplicity Inversion ("when choosing starting point"), Enterprise AI Spectrum ("when matching complexity to readiness")
Agent searches with: "lane doctrine manufacturing", "simplicity inversion starting lane", "readiness diagnostic"
RAG returns: Specific framework content about scoring deployment lanes, starting with internal batch processes, matching autonomy to governance
Result: Specific, framework-grounded recommendations tailored to manufacturing

Same model, same corpus, same RAG. The only change: the agent got a map first.

	Cold Start	With Routing Index
Query vocabulary	Problem terms only	Problem terms + canonical framework terms
Search precision	Low (generic matches)	High (guaranteed corpus hits)
Cross-references	Accidental	Deliberate (related concepts in index)
Backtracking	Random	Structured (adjacent entries)
Time to relevant content	Minutes of wandering	Seconds of navigation

Building Your Routing Index

Start small. 10–15 entries. Don't try to index everything. Start with your most-used frameworks, most-referenced documents, most-important concepts.

Iteration pattern: After each significant AI interaction, ask: "Was there a concept the agent should have known about but didn't?" If yes, add a routing entry. The index grows organically. Every routing entry you add improves the next interaction. The index compounds.

Time investment: 1–2 hours for the initial 10–15 entries. 5 minutes per new entry thereafter.

From Map to Exploration

The routing index is Stage 1 of the cognition supply chain. It solves the cold-start problem by providing information scent — vocabulary and topology — before the agent searches. With a map in hand, the agent can explore properly. The next chapter shows how exploration differs from simple retrieval — and why the dual-query pattern prevents retrieval from being polluted by reasoning needs.

Key Takeaways

• Agents face two cold-start problems: vocabulary (what terms exist?) and topology (what's important?)
• Routing indexes provide information scent — the agent estimates value before paying exploration costs
• A routing index is a control surface, not documentation: name, importance, "retrieve when" trigger, tags, related concepts, URL
• Works regardless of retrieval strategy (RAG, agentic, hybrid)
• Start with 10–15 entries and iterate — the index compounds with use

Part II: Building the Pipeline

Explore, Don't Just Retrieve

Keyword search returns 10 thumbnails for "kookaburra." An LLM judge picks the one showing a zoomed-in kookaburra in a gum tree. Same pattern, generalised to text.

Here's a concrete example of the problem — and the fix. An image search system using a public library. Keyword search for "kookaburra" returns 10 thumbnails. Some are close-ups, some distant shots, some on fences, in flight, on power lines. The search found "kookaburra" — broad recall, job done.

But what was actually wanted: a zoomed-in kookaburra sitting in a gum tree.

The fix: send a keyword search to get candidates. Send a SEPARATE rich context question to an LLM judge: "Which of these shows a zoomed-in kookaburra in a gum tree?" The search finds candidates. The judge picks the right one.

This same pattern — broad retrieval + precise judgment — generalises to text retrieval. And it's the key to Stages 2 and 3 of the cognition supply chain.

RAG's Fundamental Limit

Standard RAG: query, vector similarity, top-k chunks, synthesis. Find the chunks most semantically similar to your query. Good enough for simple, local questions: "Find the paragraph about X."

What standard RAG can't do:

× Cross-document dependencies: "X is defined in doc A, constrained by doc B, and exceptioned by doc C." Chunk similarity doesn't represent these relationships.
× Global context: Chunking loses the big picture.¹¹ Each chunk is an island, divorced from its position in the larger argument.
× Dependency-shaped knowledge: Most real enterprise knowledge IS dependency-shaped — policies reference contracts, specs depend on other specs, understanding requires traversal.

Chunking loses context. Semantic similarity looks at a few chunks similar to your query and completely ignores cross-references.¹² If your RAG works brilliantly for "find me a quote about X" but fails for "how does policy X interact with exception Y from contract Z" — you're hitting RAG's architectural ceiling, not a model limitation.

Agentic File Search — Exploration as Retrieval

"Stop pretending retrieval is a single query-time lookup. Make it an agentic investigation."

Coding agents — Claude Code, Cursor, Copilot — don't rely on semantic similarity alone. They skim, search, jump to definitions, follow imports, "open the next file because that file points to it."¹³ The insight: generalise this pattern from code to ALL documents.

Three-Phase Exploration

Phase 1: Parallel Scan

Preview all docs quickly. LLM identifies which potentially contain relevant information by reading starts/headers/summaries.

Phase 2: Deep Dive

Fully parse and read only the promising docs. LLM identifies missed cross-references during reading.

Phase 3: Backtrack

Follow cross-references to docs missed in the initial scan. The agent picks up what it now knows it needs.

This three-phase exploration pattern — scan, deep dive, backtrack — turns retrieval from a single lookup into an investigation. The tools are simple: scan folder, preview, parse, read, regex search, glob. It's "Claude Code energy" for document corpora, not just code.

The key difference from RAG: cross-document dependencies become first-class. The agent FOLLOWS references rather than hoping chunk similarity catches them accidentally.

The Dual-Query Pattern

Here's the core insight most retrieval systems miss: retrieval and reasoning want different inputs. Conflating them — which is what everyone does — guarantees mediocre results.

Consider the manufacturing client problem. Your semantic query: "problems with AI implementation." What you really need: "manufacturing client, early-stage, struggling to find where to start." If you put "manufacturing" into the semantic search, you'll get content about manufacturing PROCESSES — completely off-target. The manufacturing context belongs in the judgment phase, not the retrieval phase.

The pattern separates these concerns:

Retrieval Query

Broad, recall-heavy. "AI implementation failure modes, where to start." Don't miss anything. Optimised for coverage.

Goal Specification

Rich, intention-heavy. "Manufacturing client, early-stage, needs safe starting lane, risk-averse board." Full context of what you actually need. Optimised for precision.

Judge / Reranker

LLM examines retrieved candidates against the goal spec. Selects, refines, or rejects. You get broad recall AND precise selection.

The Dual-Query Pattern Template

RETRIEVAL QUERY (sent to search/RAG):
  "AI implementation failure modes, starting points, common mistakes"
  [Broad, keyword-focused, optimised for recall]

GOAL SPECIFICATION (sent to LLM judge):
  "We have a manufacturing client (500 employees, early-stage AI,
  risk-averse board). They need to identify their safest starting
  lane. Focus on deployment patterns that minimise governance burden
  and maximise early wins."
  [Rich, intention-focused, optimised for precision]

JUDGE INSTRUCTION:
  "Review the retrieved candidates against the goal specification.
  Select the 3 most relevant. Explain why each was selected.
  Note any gaps the retrieval didn't cover."

The kookaburra proof shows this works in production: search query = "kookaburra" (broad). Goal spec = "zoomed-in kookaburra in a gum tree" (specific). Judge = LLM reviews thumbnails against goal spec, picks the best match. Same pattern, proven and generalised.

The Industry Is Converging on This Pattern

This isn't theoretical. The search-then-judge pattern is already shipping:

Tavily implements search, extract, then LLM answer.⁷ Their /search endpoint returns concise snippets optimised for LLM ingestion. /extract pulls full cleaned content. include_answer adds LLM synthesis. This IS the dual-query pattern commercialised: search (broad) then extract (relevant) then answer (judged).

Claude web search operates as an agentic loop — Claude decides when to search, the API runs searches, and this can repeat multiple times in one request.⁸ The model is exploring, not just retrieving.

LlamaIndex file-explorer agents match RAG quality for complex queries but with higher latency.¹⁴ Better for background/async tasks while RAG wins for real-time. The discriminant variable: latency tolerance vs depth requirements. This maps directly to the Fast-Slow Split: exploration is the slow lane doing heavy cognition; interaction is the fast lane.

The Hybrid Architecture

You don't choose RAG OR exploration. You use both.

RAG vs Exploration: Different Tools for Different Jobs

RAG (Fast Lane)

• Low-latency, single-document queries
• "Find me the section about..."
• Broad candidate generation
• The broad net

Exploration (Slow Lane)

• Dependency-shaped, cross-document queries
• "How does X interact with Y given Z?"
• Deep truth-finding with audit trail
• The scalpel

Integration pattern: Use RAG results as starting points for exploration. RAG generates candidates; exploration agents follow the most promising ones, reading full documents, following cross-references, building a complete picture.

From Exploration to Compression

The dual-query pattern and agentic exploration are Stages 2 and 3 of the cognition supply chain. But exploration generates volume — 200K tokens of thrashing, contradictions, and dead ends. The next chapter shows how to compress this into the 20K tokens of signal the main agent actually needs.

Key Takeaways

• RAG fails for dependency-shaped knowledge — cross-document, cross-reference questions
• Agentic exploration generalises coding-agent patterns to document corpora: scan → deep dive → backtrack
• The dual-query pattern separates retrieval (broad, recall-heavy) from judgment (rich, intention-heavy)
• The industry is already converging: Tavily, Claude web search, LlamaIndex file explorers
• Hybrid is the right answer: RAG for speed, exploration for depth

Part II: Building the Pipeline

Compress Through Sub-Agents

A sub-agent spends 200,000 tokens on research. The outcome is 20,000 tokens of applicable findings. The 200,000 vanish — but the 20,000 that remain are pure signal.

A sub-agent might spend 200,000 tokens on its work — the original briefing, the research, the dead ends, the contradictions, the writing and rewriting. But its outcome is 20,000 tokens of applicable research. You've taken the framework roadmap, expanded it for this particular client, then compressed it back.

200,000 tokens vanish. But the 20,000 that remain are pure signal. This is Stage 4 of the cognition supply chain: the compression step that turns messy exploration into high-density, decision-ready output.

The Scatter-Gather Pattern for Cognition

10:1

Compression ratio. 200K tokens of exploration become 20K tokens of curated signal. The intermediate reasoning evaporates when the sub-agent terminates.

The architecture is clean. The main agent is the orchestrator: it holds the map, the client goal, evaluation criteria, and "don't be stupid" constraints. It preserves its context window for synthesis and judgment — the expensive work.

Sub-agents are the explorers. Each starts with a clean slate. Goes deep on a specific angle. Less biased by the main agent's current narrative because they don't share context.

The output is distilled artefacts: 20K tokens of "useful truth" instead of 200K tokens of wandering.

Why does context isolation matter so much? Sub-agents can thrash around, contradict themselves, chase weird leads — and the main agent never sees the mess. The main agent only accepts the compiled memo. It's quality control by architecture, not by hope. Messy intermediate reasoning never pollutes the parent context.

The economics: burn cheap tokens on exploration (sub-agent private contexts are disposable). Preserve expensive main-context tokens for synthesis and judgment (the orchestrator's context is precious real estate).

"200,000 tokens vanish, but the 20,000 that remain are pure signal."

Why Sub-Agents Work So Well

Anthropic's own research validates this pattern directly, showing multi-agent orchestration achieves 90.2% success rate versus 14-23% for single-agent approaches on software engineering benchmarks:⁹

"The essence of search is compression: distilling insights from a vast corpus. Subagents facilitate compression by operating in parallel with their own context windows, exploring different aspects of the question simultaneously."¹⁵

Parallel tool calling cuts research time by up to 90% for complex queries.¹⁶ Multi-agent research systems excel especially for breadth-first queries with multiple independent directions.¹⁷

Sub-agents operate as ephemeral sandboxes — spin up with a narrow brief, explore deeply, emit a clean artifact, terminate. This is Context Engineering's fork-exec pattern: the Unix model of process isolation applied to cognition.¹⁸

The Priming Problem — And How to Solve It

Sub-agents start empty. They don't share the main agent's context. You've got to prime them about what they're trying to do, which is a real hassle.

The mistake most people make: priming like they're telling a story. Long, narrative descriptions of the context, the background, the history, the nuances. This wastes tokens, it's imprecise, and the sub-agent may misunderstand the mission entirely.

The fix: stop priming like a story. Start priming like a function call. A briefing packet with a stable schema.

The Briefing Packet Schema

Field	Purpose	Example
Mission	What decision/outcome are we enabling?	"Identify which 3 of our 30 frameworks best apply to this manufacturing client"
Non-goals	What NOT to spend tokens on	"Don't evaluate financial ROI. Don't compare vendors."
Constraints	Governance, risk, tools allowed	"Australian context only. Use RAG search, not web."
Client context	The 10 facts that matter (not 100)	"500 employees, manufacturing, early-stage AI, risk-averse board"
Framework anchors	Relevant named frameworks + one-liners	"Lane Doctrine (deploy where physics is on your side), Simplicity Inversion (start with 'complex' internal tools)"
Deliverable format	Exactly what to return	"Return: (1) findings, (2) implications, (3) recommended moves, (4) citations, (5) open questions"

Schema > narrative: the sub-agent knows exactly what job it has, what to ignore, and what format to return. No ambiguity. No wasted tokens on understanding intent.

The Self-Priming Trick

Sometimes the sub-agent misunderstands the mission. It goes deep on the wrong angle, burns 200K tokens, and returns irrelevant findings. Expensive failure.

The fix: force the sub-agent to first produce a tiny "I understand the job" recap in your framework language before it starts researching.

Two benefits: it catches misunderstanding early (if the recap is wrong, you can correct before burning exploration tokens), and it aligns subsequent reasoning (the sub-agent is now reasoning in your framework handles, making its later work more coherent).

Token Economics of Sub-Agent Compression

The numbers from production workflows:

Token Economics: With and Without Sub-Agents

	Exploration	Compilation	What Main Agent Sees
1 sub-agent	200K tokens	20K output	20K pure signal
5 parallel sub-agents	1M tokens total	100K combined output	100K curated signal
Without sub-agents	1M tokens in main context	N/A	1M noise + signal mixed

The sub-agent pattern doesn't just save tokens — it saves attention. The main agent's context stays clean.

The main agent's context window is precious — every token affects attention quality. Sub-agents burn cheap tokens (private context, disposable). The main agent reads expensive tokens (curated, high-signal). It's like hiring 10 research assistants who each write you a one-page memo. You read 10 pages, not 1,000.

Practical Patterns for Sub-Agent Compression

Pattern 1: Research Agent

Receives briefing packet + access to RAG/search. Explores a specific question. Returns: findings + evidence + citations + open questions.

Pattern 2: Framework Application Agent

Receives one framework + client context. Applies the framework to the client. Returns: applicability assessment + specific recommendations + caveats.

Pattern 3: Critique Agent

Receives a draft output + evaluation criteria. Reviews for gaps, contradictions, unsupported claims. Returns: issues found + severity + suggested fixes.

Pattern 4: Compiler Agent

Receives multiple sub-agent outputs. Merges, resolves conflicts, eliminates duplications, synthesises into coherent narrative. Returns: unified findings with conflict resolution notes.

From Compression to Compounding

The supply chain now routes, explores, judges, and compresses. But the real power emerges in Stage 5: when each cycle feeds the next. The compounding loop is what turns a pipeline into a flywheel.

Key Takeaways

• Main agent = orchestrator (preserve context for synthesis). Sub-agents = explorers (burn tokens in isolation).
• Context isolation is quality control by architecture — messy reasoning never pollutes the parent
• Prime sub-agents with a briefing packet (schema), not a narrative. Mission, Non-goals, Constraints, Client context, Framework anchors, Deliverable format.
• Self-priming trick: force a 200-token recap before 200K tokens of exploration
• The 10:1 compression ratio: 200K tokens vanish, 20K of signal remain

Part II: Building the Pipeline

The Compounding Loop

Chess move-ordering — opening books bias search toward high-value lines first. Same principle applied to cognition.

In chess programming, opening books let the engine play known-good moves instantly — zero search cost. Move-ordering heuristics bias the search toward the most promising branches first.¹⁹ Iterative deepening searches shallow, finds candidates, then deepens the best lines under a time budget.

This is exactly what the cognition supply chain does — but for knowledge work. Frameworks are the opening book. The routing index is move ordering. Agentic exploration is iterative deepening. And after each game, the engine learns which openings worked.

The difference between a pipeline and a flywheel: a pipeline processes inputs into outputs. A flywheel processes inputs into outputs AND improves itself. Stage 5 of the supply chain is what makes it compound.

The Compounding Information Foraging Loop

When agents can retrieve, each read changes the next question. The agent starts with a broad query informed by the routing index (Chapter 3). Retrieval returns candidates. The agent reads them. Reading changes the agent's understanding. The next query is sharper. Sharper query → more relevant retrieval → deeper understanding → even sharper query.²⁰

This is information foraging with feedback — a compounding loop where retrieval improves reasoning AND reasoning improves retrieval.

The agent doesn't follow a predetermined sequence. It walks its own path through the frameworks based on what it thinks will help solve the current issue. Give a thin overview of frameworks plus a mechanism to search deeper. The agent sees the one-liners, gets the lay of the land, then digs into what's relevant. It's not a script — it's directed foraging.

This is iterative deepening in action: start with a thin map (routing index), pick a promising branch (highest-scent entry), expand it (deep search), re-score the landscape, expand again. Each cycle is informed by everything learned so far. The thin map becomes progressively richer without ever being loaded in full.

Frameworks as Move-Ordering Heuristics

The chess analogy isn't decorative — it's structurally accurate. Every element of the cognition supply chain maps to a chess search mechanism.

The Chess-Cognition Parallel

Chess Engine

• Opening books — known-good positions, zero search cost
• Move-ordering heuristics — explore best candidates first
• Iterative deepening — search shallow, then deepen best lines
• Evaluation function — score positions against criteria
• Transposition table — cache explored positions

Cognition Supply Chain

• Routing index — canonical terms, instant navigation
• Frameworks — bias search toward high-value paths
• Agentic exploration — scan, deep dive, backtrack
• Dual-query judgment — score against goal specification
• Documented dead ends — don't re-explore what failed

Frameworks act as move-ordering heuristics: they bias the search toward high-value lines first. The Lane Doctrine says "don't start in the hardest lane." The Fast-Slow Split says "separate the talker from the thinker." Governance-first says "build the safety infrastructure before deploying." These aren't suggestions — they're search-space pruning. They tell the agent which branches to explore FIRST and which to skip.

The routing index functions as an opening book: fast access to known-good structures and sharp definitions so the system doesn't waste time reinventing concepts already solved. The agent plays the opening instantly, then starts thinking deeply once it's past known territory.

Agentic exploration is iterative deepening: search shallow first to find candidate lines, then deepen the best ones under a token budget.²¹ Just as a chess engine won't spend 10 minutes analysing a line it scored poorly in the first pass, the cognition supply chain won't burn 200K tokens exploring a path the routing index flagged as low-value.

The dual-query pattern (Chapter 4) serves as the evaluation function: score candidates against the goal specification, not just by pattern matching. The manufacturing context stays in the evaluation, not the search query — just as a chess evaluation function considers piece position without rerunning the move generator.

Why move ordering matters: without it, the agent explores branches in random order, burning tokens on low-value paths. With framework-biased move ordering, high-value paths are explored first. This is chess-style search through idea space, where frameworks act as move-ordering heuristics — systematic exploration that prunes the tree before wasting compute on dead branches.

The Meta-Loop

The five stages aren't a linear pipeline. They're a flywheel where each revolution starts from a higher baseline than the last.

The Kernel Flywheel

1. Time → Cognition — exploration produces understanding

2. Cognition → Artefacts — understanding produces deliverables

3. Artefacts → Compression — deliverables reveal patterns worth keeping

4. Compression → Better Priors — patterns become new routing entries, refined frameworks

5. Better Priors → Better Search — refined frameworks improve the next exploration cycle

↺ Loop — each cycle starts smarter than the last

Each cycle improves the next.²² Every framework you build today solves problems you haven't encountered yet. Every routing entry you add makes the next search more precise. Every sub-agent output you compress teaches you what's worth compressing.

This connects directly to the principle of compiling domain expertise into frameworks that serve as the "kernel" loaded into every AI interaction. The kernel flywheel IS the compounding loop: extract patterns → compress into frameworks → prefetch solutions → apply to new contexts → learn from application → improve kernel. The supply chain isn't just using the kernel — it's continuously refining it.

When Compounding Stalls — And How to Fix It

The compounding effect won't stay exponential forever. It hits diminishing returns unless you add two things:

1. A Scoring Function

What counts as "progress"? Risk reduced. Uncertainty collapsed. Decision narrowed. Value unlocked.

Without measuring, you can't tell if you're compounding or just spinning. The scoring function turns subjective "this feels better" into objective "we resolved 3 open questions and identified 2 new risks."

2. A Compression Step After Each Deepening

Convert what you learned into a tighter map. New rule. New heuristic. New "don't do this" constraint. New routing entry.

If you explore but don't compress the learning back into the system, the next cycle doesn't benefit. You explored, but you didn't compound.

The End-to-End Pipeline Pattern

Here's the complete architecture — all five stages together, showing how data flows through the supply chain and how each cycle feeds the next:

The Complete Cognition Supply Chain

1. Router Index (The Map)

Agent reads routing index → discovers canonical terms → selects exploration starting points. The map provides information scent (Chapter 3) so the agent navigates directly to high-value content.

2. Exploration Agents (Cheap, Parallel)

Multiple sub-agents scan → deep dive → backtrack across the corpus. Each starts with a briefing packet (Chapter 5). Each follows cross-document dependencies that RAG would miss.

3. Judge (Dual-Query Reranker)

Each exploration output evaluated against the goal specification (Chapter 4). LLM judge selects, refines, or rejects. Manufacturing keywords stay in the goal spec, not the search query.

4. Compiler Agent (Merge, Resolve, Compress)

Takes judged outputs from multiple explorers. Merges findings, resolves conflicts, eliminates duplication, compresses into high-density output. 200K tokens of exploration → 20K tokens of signal.²³

5. Main Agent Reads Result (Synthesis)

The orchestrator reads 20K–100K tokens of compiled, judged, compressed signal. Synthesises into the final deliverable. Context stays clean. Attention stays focused.

6. → Next Cycle Starts Smarter

New routing entries from what was discovered. Refined frameworks from what was learned. Documented dead ends from what failed. The supply chain improves for next time.

This is the full cognition supply chain: from raw knowledge → through routing, exploration, judgment, compression → to client-ready, domain-specific output → with compounding improvement each cycle. Each stage has been built in Part II: routing (Chapter 3), exploration and judgment (Chapter 4), compression (Chapter 5), and now the compounding loop that ties them into a flywheel.

From Chat Toy to Operating System

What starts as "better retrieval" becomes something more fundamental. Track the trajectory:

• With routing — the agent navigates your domain like a specialist, not a generalist

• With exploration — the agent follows dependency chains that RAG misses

• With judgment — the agent filters for YOUR specific goal, not generic relevance

• With compression — the agent operates on signal, not noise

• With compounding — the system gets better every time you use it

"The system becomes less like a chat toy and more like an operating system for decision-making."

The system doesn't just answer questions. It accumulates intelligence. It builds institutional memory through its framework library and routing index. It improves with use. Every interaction either teaches it a new route or sharpens an existing one.

The Moat

Your competitor can buy the same model, the same tools, even hire the same developers. They can't copy your framework library, your routing index, your compression patterns, or your institutional learning encoded into the supply chain.

And this creates a genuine competitive moat. Your competitor can buy the same model tomorrow. They can subscribe to the same tools. They can even hire the same developers. What they can't copy: your framework library, your routing index, your compression patterns, your institutional knowledge encoded into the supply chain. Those took cycles to build. Each cycle compounded the last.

Architecture compounds. Models don't.²⁴ That's the thesis of this entire ebook, proven stage by stage through Part II.

From Theory to Practice

Part II has built the full pipeline: route (Chapter 3), explore and judge (Chapter 4), compress (Chapter 5), compound (Chapter 6). Part III shows this same architecture applied to two real-world domains — proposals and research — then provides a practical implementation guide for building your own supply chain, stage by stage.

Key Takeaways

• Compounding happens when retrieval improves reasoning AND reasoning improves retrieval
• Frameworks are move-ordering heuristics that bias agent search toward high-value paths first
• The meta-loop: time → cognition → artefacts → compression → better priors → better search
• Compounding stalls without two additions: a scoring function and a compression step after each cycle
• The complete pipeline: Router → Exploration agents → Judge → Compiler → Main agent → Next cycle starts smarter
• Architecture creates a moat competitors can't copy by upgrading their model

Part III: The Supply Chain in Practice

The Proposal Supply Chain

30+ frameworks applied to a client. Only 6 make the proposal. The supply chain decides which 6 — and why.

You have 30+ proprietary frameworks. A new client just landed. They're in manufacturing, 500 employees, early-stage AI, risk-averse board. You need a bespoke proposal that applies exactly the right frameworks to exactly their situation.

The old way: read every framework manually. Decide which ones apply. Write the proposal from scratch. Takes days. And by framework #15, you've forgotten the nuances of framework #3.

The supply chain way: route the client context through 30+ frameworks using sub-agents. Judge which 6 are most valuable AND most differentiated. Compress into a proposal that reads like you've been studying their business for weeks. Hours, not days.

Mapping the Supply Chain to Proposals

The proposal workflow is the cognition supply chain (Chapter 2) applied to customisation at scale. Each stage maps directly:

Stage 1 — Route

The routing index identifies which frameworks MIGHT apply to this client. Client tags — "manufacturing", "500 employees", "early-stage AI", "risk-averse board" — are matched against each framework's "retrieve when" triggers.

Result: 15–20 frameworks flagged as potentially relevant (broad recall). The Lane Doctrine, Simplicity Inversion, Enterprise AI Spectrum, Three-Lens Framework, and a dozen others pass the initial filter.

Stage 2 — Explore

Sub-agents apply each framework to the client's specific context. Each receives one framework + client context via a briefing packet (Chapter 5). Each produces an applicability assessment, specific recommendations, caveats, and examples relevant to manufacturing.

Scale: 15–20 parallel explorations × ~10K tokens each = ~150K–200K tokens of exploration. Each sub-agent works in isolation — no attention fatigue, no cross-contamination between frameworks.

Stage 3 — Judge

Selection with diversity control. Not "pick the 6 best" (haphazard) — but sequential selection that ensures each framework adds unique value. The goal spec for the judge: "Which framework provides the most unique value for THIS client that the other selected frameworks don't already cover?"

Method: Pick the best one. Then the next best one that's different enough. Repeat until 6 — or until no more genuinely distinct value exists.

Stage 4 — Compress

15–20 framework analyses compressed into 6 selected proposal sections. Each section: framework name, why it applies to this client, specific recommendations, expected outcomes.

Output: ~20K tokens of proposal content from ~200K tokens of exploration. The 180K tokens of rejected frameworks aren't wasted — they've been evaluated and documented as "considered but not selected."

Stage 5 — Compound

Each proposal improves the framework library. Which frameworks consistently get selected for manufacturing clients? (Strengthen routing.) Which are never selected? (Refine or retire.) What client-specific patterns emerged? (New routing entries.) What gaps appeared? (New frameworks needed.)

Effect: The fifth proposal costs a fraction of the first. The supply chain infrastructure is reusable — only the client context changes.

The Selection Trick — Pick One at a Time

Tell a model to "pick 6 frameworks" and it selects haphazardly — often picking frameworks that overlap heavily, missing important diversity. The output feels redundant. Three of the six say variations of the same thing.

The fix: sequential selection with diversity constraints.

Sequential Selection Protocol

Round 1

"Pick the single most valuable framework for this manufacturing client." → Lane Doctrine (deploys where physics is on your side — critical for risk-averse boards).

Round 2

"Given Lane Doctrine is selected, pick the next most valuable that adds something Lane Doctrine doesn't cover." → Three-Lens Framework (stakeholder alignment — different concern from deployment lane).

Round 3

"Given Lane Doctrine + Three-Lens, pick the next..." → Simplicity Inversion (where to start — neither deployment lanes nor stakeholders).

Rounds 4–6

Continue until 6 selected — or until the model can't find another genuinely distinct, useful framework. Reaching 5 instead of 6 is signal, not failure.

This is Max Marginal Relevance from information retrieval²⁵: novelty-weighted selection that prevents near-duplicates. The same principle used in search result diversification — applied to framework selection.

	Batch ("Pick 6")	Sequential ("Pick 1, then next 1")
Diversity	Poor — often selects overlapping frameworks	High — each selection explicitly diversifies
Quality	Moderate — compromises across all 6	High — each pick optimised independently
Exception handling	None — always returns 6	Natural — stops when no more distinct value
AI reliability	Models struggle with multi-criteria bulk selection	Models excel at single-choice comparison

Meta-Credibility — The Proposal IS the Proof

"The proposal IS the proof. The way you sold them is the way you'll serve them."

The proposal itself demonstrates the supply chain in action. The client reads a bespoke, framework-grounded, deeply researched document and the implicit message is unmistakable: "If they can produce this for me before we've even signed, imagine what they can build FOR my business."³⁰

You didn't produce a generic slide deck. You routed their specific context through proprietary frameworks, explored each one's applicability, judged which combination provides maximum unique value, and compressed it into a coherent narrative. The method IS the message.

This is the principle of economies of specificity²⁷ applied to consulting. Industrial-era economics favoured standardisation — one niche, one offer, one template. When cognition is cheap and parallel, the economics flip: customising perfectly for each client costs less than maintaining generic materials.

The frameworks are source code. The proposal is the compiled binary. Fix the source once, and all future compilations improve. Each proposal is a regenerable artefact — not a one-off document crafted through heroic effort.

Token Economics of Proposal Generation

~$3–5

API cost for a deeply bespoke, framework-grounded proposal. Compare: 2–3 consultant-days of manual framework selection and proposal writing.²⁸

The production numbers: 30 frameworks × sub-agent application = ~300K exploration tokens. Selection and compression produces ~20K tokens of proposal content. Total API cost: roughly $3–5 for a proposal that reads like weeks of research.

The compounding economics are even more compelling. The first proposal is the most expensive — building the routing index, refining the briefing packets, tuning the selection protocol. By the fifth proposal, the infrastructure is proven.²⁹ Only the client context changes. The marginal cost drops while the quality increases.

And the quality advantage compounds too. Each framework was applied by a fresh sub-agent with no attention fatigue. The judge reviewed each candidate against the specific client goal — no "good enough" compromises. The compiler resolved conflicts and ensured coherence. Human judgment is preserved for the FINAL review, not wasted on the exploration.

Same Architecture, Different Domain

This isn't a new pattern — it's the same five stages from Part II applied to proposals. The architecture is reusable; only the content changes. The next chapter applies the same supply chain to research and knowledge synthesis — where the ebook you're reading right now serves as the worked example.

Key Takeaways

• Route: routing index identifies potentially relevant frameworks (broad recall from 30+ to 15–20)
• Explore: sub-agents apply each framework to the client in parallel (150K–200K tokens of expansion)
• Judge: sequential selection (pick 1, then next 1) ensures diversity and quality
• Compress: 200K exploration → 20K proposal content (10:1 compression)
• Compound: each proposal improves routing and framework selection for the next one
• Meta-credibility: the proposal IS the proof — the method demonstrates the value

Part III: The Supply Chain in Practice

The Research Supply Chain

This ebook was written using a cognition supply chain. The conversation that produced the source material IS a worked example.

A model with 18-month-stale training data.⁵ No knowledge of current AI capabilities. Yet within minutes, producing expert-level analysis of AI deployment patterns, governance architecture, and implementation strategy.

How? Not by upgrading the model. By routing the conversation through compressed frameworks, exploring with agentic retrieval, judging against specific goals, and compressing findings back into reusable artefacts.

This chapter traces the supply chain through a research workflow — and in doing so, demonstrates the very thesis of this ebook.

The Research Problem: AI Asking AI About AI

"If you ask AI how to implement AI, it won't even scratch the surface. It'll be wrong, inconsistent. But ask AI about blue whales and you've got heaps of knowledge and background."

This is the "unstable domain" problem from Chapter 1.⁵ The model's training data is stale for fast-moving fields. Without external knowledge, it hallucinates plausibly — producing confident-sounding advice that's months or years behind current practice.

The naive solution: give it access to the internet. Let it search. Let it read. But without a routing index, the agent searches with generic terms, retrieves generic results, and produces generic synthesis. The internet is the ultimate high-noise, low-signal corpus.

The supply chain solution: don't ask the model to know. Give it a map of what exists, the tools to explore, and the judgment layer to filter for relevance.

The Research Supply Chain — Stage by Stage

Stage 1 — Route: The Framework Map

The routing index — 30+ articles indexed with framework name, importance, "retrieve when" triggers, tags, related concepts, and URLs — provides information scent before the first search.¹⁰

The agent reads this map before searching. Instantly knows: Lane Doctrine exists, Simplicity Inversion exists, Context Engineering exists — and when each is relevant.

Compare to cold-start research: "Uh... maybe I should search for AI governance? AI deployment? AI strategy?" The routing index collapses this to: "Lane Doctrine applies here. Let me search for Lane Doctrine content."

Stage 2 — Explore: Iterative Deepening

The agent uses canonical terms from the routing index to search. Each search is informed by previous findings — iterative deepening, not single-shot.²¹

Example chain: Agent sees "Context Engineering" in the index → searches → finds the "attention budget" concept² → realises this connects to "signal density" → searches again with refined terms → finds the sub-agent compression pattern → follows THAT thread to Anthropic's multi-agent research.¹⁵ Each retrieval sharpens the next query.

Stage 3 — Judge: Dual-Query Filtering

Retrieval query: "context engineering attention budget signal density" (broad, recall-heavy)

Goal spec: "How does context quality vs quantity affect the cognition supply chain thesis?" (rich, intention-heavy)

The judge filters retrieved chunks against the actual research question — not just semantic similarity.⁷ Chunks that match the keywords but don't serve the thesis get filtered out.

Stage 4 — Compress: Thousands of Pages → 436 Lines

The research file for this ebook: 436 lines distilled from thousands of pages across 30+ articles, Anthropic's engineering blog, academic papers, and API documentation.²³

Each finding: quote + source + analysis + connection to the article thesis. The thousands of pages of intermediate reading vanish. The 436 lines of curated evidence remain.

Stage 5 — Compound: A New Framework Emerges

Writing this ebook revealed the "Cognition Supply Chain" as a new named framework. It didn't exist before this research cycle. The routing index gains a new entry. The framework library gains a new framework. The next ebook starts from a higher baseline.²²

"The framework you build today solves problems you haven't encountered yet."

The Live Demonstration — This Conversation

The proof is in the source material for this ebook. A recorded conversation where frameworks were described (compressed context), the AI explored the website and followed cross-references between articles, and each exchange deepened understanding. The AI's responses became more specific, more framework-grounded, more nuanced with every turn.

By the end: the AI had synthesised "cognition supply chain" as a concept — something NEITHER participant started with. It emerged from the compounding exploration loop.²² That's the flywheel in action: the conversation itself was a cognition supply chain cycle.

What makes this remarkable: the model's training data predates most of these frameworks. It had never "seen" Lane Doctrine or Simplicity Inversion in training. Yet with the routing index and framework access, it reasoned with these concepts expertly.⁵ The map made the model a specialist.

The Counter-Factual Test

Take the same model. Ask it the same question. Once with the cognition supply chain. Once without.

✓ With Supply Chain

"For your manufacturing client, the Lane Doctrine suggests starting with internal batch processes — lowest blast radius, existing governance, high parallelism benefit. The Simplicity Inversion explains why the chatbot they want is actually the hardest starting point. Score it on the Lane Test: 2/2 danger flags (real-time + customer-facing) = boss fight territory..."

✗ Without Supply Chain

"When implementing AI in manufacturing, consider starting with a pilot project. Key factors include data quality, team readiness, and executive sponsorship. Many organisations find that beginning with internal processes can help build confidence before expanding to customer-facing applications..."

Same model. Same intelligence. Radically different output quality.

Research Maturity Progression

The retrieval maturity ladder (Chapter 2) applied specifically to research workflows:

Level 1: Manual Search

Read articles. Copy quotes into a document. Hope you don't miss anything important. Exhausting. Error-prone. Doesn't scale.

Level 2: RAG Over Knowledge Base

Better recall, but chunk-level — misses cross-document connections and dependency chains.¹² Most "we do research with AI" organisations sit here.

Level 3: Agentic RAG

The agent reformulates queries, spots gaps, follows leads.⁸ Still bounded by chunk similarity, but vastly better than single-shot. The routing index alone gets you here.

Level 4: Full Research Supply Chain

Routing index → agentic exploration with tool access → dual-query judgment → sub-agent compression → compiled research artefact. This is the moat.

The highest-ROI upgrade: going from Level 2 to Level 3 by adding a routing index. A few hours of work that permanently transforms research quality. The full Level 4 supply chain is the long-term goal; the routing index is the starting point.

"AI Asking AI About AI" — Solved

The original problem — "AI asking AI about AI doesn't work" — is now solved. Not by making the model smarter. By giving it:

1. A map (routing index of the framework library)

2. Tools to explore (RAG search, file access, web search)

3. A judgment layer (dual-query: search broadly, judge against specific research goals)

4. A compression mechanism (sub-agent exploration → distilled findings)

5. A compounding loop (each research cycle improves the routing index)

The insight generalises beyond AI research. Any "unstable domain" — where model training data is stale and the search space is unbounded — benefits from the same supply chain.⁶ Regulatory compliance. Emerging technology assessment. Competitive analysis. Market strategy. Anywhere the ground shifts faster than models can be retrained.

The existence of this ebook is the meta-proof. Written using the supply chain, about the supply chain. If the architecture didn't work, you'd be reading generic AI advice. Instead, every chapter is grounded in specific frameworks, backed by specific evidence, and structured through specific patterns — all fed through the pipeline.

From Examples to Implementation

Part III has shown the supply chain applied to two domains: proposals (Chapter 7) and research (this chapter). Same architecture, different content. The final chapter provides a practical implementation guide — how to build your first cognition supply chain, stage by stage, starting with the routing index.

Key Takeaways

• "AI asking AI about AI" fails without a supply chain — the domain is too unstable, the search space too unbounded
• The routing index bridges the stale-training gap by providing vocabulary and topology the model lacks
• Iterative deepening (not single-shot search) enables compounding understanding across exploration cycles
• The counter-factual test proves it: same model, with vs without supply chain = radically different quality
• Any unstable domain benefits from the same architecture — this isn't AI-specific

Part III: The Supply Chain in Practice

Building Your First Supply Chain

Build your routing index this week. Start with 10 framework one-liners and a “retrieve when” trigger for each.

You don't need the full five-stage pipeline on day one. You need the minimum viable supply chain — and the discipline to compress each cycle's learning back into the system.

This chapter maps the implementation path from "we stuff context into system prompts" to "we run a compounding cognition engine." Each stage builds on the last. Start this week.

The Minimum Viable Supply Chain

What you actually need to start: a routing index + agentic search + one judgment layer. That's it. No sub-agent orchestration required on day one. No fancy tooling. No custom infrastructure. A markdown file with framework entries. An AI tool that can search. A habit of separating your search query from your goal.

The 80/20 insight: before the routing index, the agent searches with problem terms and gets generic results. After the routing index, the agent searches with canonical terms and gets targeted results.¹⁰ This single change transforms output quality more than upgrading your model.⁶

Stage 1 — Build Your Routing Index (This Week)

Start with what you know. List your 10–15 most important concepts, frameworks, documents, or knowledge assets. You already know what matters — you just haven't written it down in a machine-readable format.

The minimum entry for each item:

• Name — the canonical term (exactly as it should be searched for)

• One-liner — what this is, in one sentence

• Retrieve when — when should an agent look this up? (the trigger condition)

• Tags — 3–5 keywords that connect this to common queries

• Location — where to find the full content (file path, URL, document name)

routing_index.md — example entry

## Customer Onboarding Process
- One-liner: 7-step onboarding workflow covering account setup through first value delivery
- Retrieve when: Questions about customer setup, activation, time-to-value, churn reduction
- Tags: onboarding, customer success, activation, churn, first value
- Location: /docs/processes/customer-onboarding-v3.md

Iteration rule: After each significant AI interaction, ask: "Was there a concept the agent should have known about but didn't?" If yes, add a routing entry. The index grows organically. 5 minutes per new entry.

Stage 2 — Wire Search to the Routing Index

The simplest implementation: before your AI searches, tell it to read the routing index first. A single instruction in your system prompt or briefing:

system prompt addition

"Before searching, read routing_index.md. Use the canonical
terms and tags from the index to formulate your search queries."

This alone transforms query quality from problem-terms to corpus-terms. With basic RAG, the routing index becomes a query expansion source — agent reads index, identifies relevant entries, searches with canonical terms. Hits are guaranteed because the terms come FROM the corpus.

With agentic search, the routing index becomes the starting map.¹³ Agent reads index, selects promising branches, explores those documents first, follows cross-references. The key discipline: the agent reads the MAP before it searches the TERRITORY. Non-negotiable regardless of retrieval strategy.

Stage 3 — Add the Dual-Query Pattern

When to add this: When your search results are "relevant but not quite right" — in the right neighbourhood but not serving your specific goal.

Implementation is a habit change, not an infrastructure change:

Step 1: Write Two Things

A search query (broad, keyword-focused, optimised for recall) and a goal statement (rich, specific, describing exactly what you need and why).

Step 2: Search With the Query

Retrieve candidates using the broad query. Cast a wide net.

Step 3: Judge Against the Goal

Send candidates + goal statement to the model: "Given this goal, which of these results are most relevant? Why?"

Dual-Query Example: Insurance Client

Search Query (Broad)

"error handling AI systems failure modes"

Goal Statement (Specific)

"Our insurance client has a zero-tolerance board. We need frameworks for pre-negotiating error budgets so one mistake doesn't kill the project."

Judge filters for: Three-Tier Error Budgets, the One-Error Death Spiral pattern¹ — not generic error handling advice. The insurance context stays in the goal, not the search.

The habit: every time you search, ask yourself — "Am I searching with problem keywords, or am I searching with one query and judging with another?"⁷ Separate the two.

Stage 4 — Add Sub-Agent Compression

When to add this: When your research or exploration generates too much volume for the main context to handle effectively.² When you notice the model losing coherence because it's processing too many findings at once.

Most AI coding tools already support sub-agent or sub-task patterns. Give the sub-agent a briefing packet (Chapter 5): Mission, Non-goals, Constraints, Client context, Framework anchors, Deliverable format.¹⁸ Let it explore. Receive the compressed output. Read THAT, not the exploration trace.

Stage 5 — Close the Loop (From Day One)

This isn't an advanced stage — it's a discipline that should start from the first routing index. After each significant cycle, ask:

□ What new routing entry should be added?

□ What existing entry should be refined?

□ What dead end should be documented so we don't re-explore it?

□ What pattern emerged that's worth naming?

The compound effect: Cycle 1 has 10 routing entries. Cycle 5 has 25. Cycle 20 has 50. Each entry represents institutional learning that makes every future interaction more efficient.²²

The Supply Chain Implementation Path

Stage	What to Build	When to Add	Time
1. Route	Routing index (10–15 entries)	This week	1–2 hours initial
2. Search	Wire search to read index first	Same week	30 minutes
3. Judge	Dual-query pattern	When results are "close but not right"	Habit change
4. Compress	Sub-agent with briefing packet	When research volume exceeds context	1 hour setup
5. Compound	Post-cycle compression discipline	From day one	5 min per cycle

When NOT to Use the Full Supply Chain

Not everything needs this. Honest assessment:

✗ Skip the Supply Chain When

• Simple queries — "What's the syntax for X?" Just ask the model.
• Stable domains — Blue-whale-type questions where training data is sufficient.
• Low-stakes, one-off tasks — If the output doesn't need to be domain-specific or deeply researched.

✓ Use the Supply Chain When

• Unstable domains — fast-moving fields, stale training data.
• Dependency-shaped knowledge¹² — understanding requires cross-document traversal.
• Client-specific output — generic advice isn't good enough.
• Repeated workflows — the compounding loop pays for setup.
• Institutional knowledge — the answer lives in YOUR knowledge base.

The decision rule: if your outputs are consistently brilliant without architecture, you don't need this. If they're consistently generic despite using frontier models, you need this yesterday.

Start Building Today

You don't need a smarter model. You need a cognition supply chain. The architecture determines output quality. The architecture compounds. Build your routing index this week — 10 entries, 1–2 hours. It will transform every AI interaction from that point forward.

The competitive moat isn't which model you use. It's which supply chain you've built.

Key Takeaways

• Start with the routing index — 80% of the quality improvement for 1–2 hours of work
• Wire search to the index before searching the territory (non-negotiable discipline)
• Add dual-query pattern when results are "close but not right"
• Add sub-agent compression when research volume exceeds main context capacity
• Close the loop from day one — 5 minutes of post-cycle compression enables compounding
• Not everything needs the full supply chain — use it for unstable domains, dependency-shaped knowledge, and repeated workflows

Appendix

References & Sources

This ebook draws on three categories of sources: primary research from AI engineering teams and academia, industry analysis from search and retrieval platforms, and the author's practitioner frameworks developed through enterprise AI transformation consulting. External sources are cited formally throughout. Author frameworks are presented as interpretive analysis and listed here for readers who want to explore the underlying thinking.

Numbered Citations

[1] Root Causes of Failure for Artificial Intelligence Projects

RAND Corporation research showing AI projects fail at twice the rate of non-AI IT projects, with failure driven primarily by governance and organizational issues rather than technical capability limitations.

https://www.rand.org/pubs/research_reports/RRA2680-1.html

[2] Effective Context Engineering for AI Agents

Anthropic research establishing the attention budget principle: context must be treated as finite resource with diminishing marginal returns. LLMs have working memory capacity limits analogous to human cognition.

https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

[3] On The Computational Complexity of Self-Attention (arXiv:2209.04881)

Academic research demonstrating self-attention scales quadratically with sequence length. Doubling context size quadruples computational load and attention diffusion.

https://arxiv.org/abs/2209.04881

[4] Context Quality Over Quantity

Anthropic research demonstrating that smaller, cleaner context outperforms larger, noisier context due to attention diffusion effects in transformer models.

https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

[5] Worldview Recursive Compression

LeverageAI framework for compiling domain expertise into reusable frameworks that serve as "kernel" for AI interactions. Demonstrates how compressed worldview architecture enables compounding returns.

https://leverageai.com.au/worldview-recursive-compression-how-to-better-encompass-your-worldview-with-ai/

[6] Architecture Over Model Capability

Andrew Ng's research demonstrating GPT-3.5 with agentic architecture outperforms GPT-4 alone, proving that system design matters more than raw model capability.

https://www.insightpartners.com/ideas/andrew-ng-why-agentic-ai-is-the-smart-bet-for-most-enterprises/

[7] Tavily AI-Powered Search for Developers

Two-step search-then-extract flow optimized for LLM ingestion. Demonstrates the dual-query pattern (search for candidates, extract for relevance) as a commercial product validating Stages 2 + 3 of the cognition supply chain.

https://www.tavily.com/blog/tavily-101-ai-powered-search-for-developers

[8] Claude Web Search Tool

Agentic search loop where Claude decides when to search, with the API running searches that can repeat multiple times per request. Industry validation that retrieval is moving from single-shot to exploration-based patterns.

https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool

[9] Building Effective Agents

Anthropic research showing multi-agent orchestration achieves 90.2% success rate versus 14-23% for single-agent approaches on SWE-bench benchmarks. Validates the sub-agent compression architecture (Stages 4 + 5).

https://www.anthropic.com/research/building-effective-agents

[10] Information Foraging: A Theory of How People Navigate on the Web

Peter Pirolli and Stuart Card's foundational theory from Xerox PARC explaining how agents (human and AI) navigate knowledge spaces by following "information scent" — cues that signal whether a path leads to valuable content. Provides the academic grounding for routing indexes as navigation layers that maximize value per unit of exploration effort.

https://www.nngroup.com/articles/information-foraging/

[11] RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Academic research (Sarthi et al., arXiv:2401.18059) demonstrating hierarchical retrieval architecture using tree-structured summaries that retrieve across levels of abstraction. Addresses the global context loss problem inherent in flat chunking approaches. Routing indexes represent a lightweight, human-curated implementation of similar hierarchical navigation principles.

https://arxiv.org/abs/2401.18059

[12] Microsoft RAG Chunking & Parent-Child Retrieval

Microsoft Azure Architecture guidance on semantic chunking limitations in standard RAG systems. Documents how parent-child retrieval patterns address the fundamental tradeoff: small chunks optimize for semantic matching but lose global context, while hierarchical strategies preserve document structure and cross-reference relationships.

https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/rag/rag-chunking-phase

[13] Agentic File Search Pattern

Three-phase exploration architecture (scan, deep dive, backtrack) applying coding agent patterns to document retrieval. Demonstrates how agents can follow cross-document references rather than relying solely on semantic similarity, addressing RAG's fundamental inability to traverse dependency-shaped knowledge.

https://github.com/PromptEngineer48/agentic-file-search

[14] LlamaIndex File-Based Agents

LlamaIndex documentation and benchmarks showing file-explorer agents match RAG quality for complex queries while trading latency for depth. Validates the hybrid architecture: RAG for real-time single-document queries, exploration agents for cross-document dependency analysis in background/async workflows.

https://docs.llamaindex.ai/en/stable/examples/agent/agentic_rag_with_llamaindex/

[15] Building Effective Multi-Agent Research Systems

Anthropic engineering research establishing the foundational principle: "The essence of search is compression: distilling insights from a vast corpus." Documents how subagents facilitate compression by operating in parallel with their own context windows, exploring different aspects of questions simultaneously.

https://www.anthropic.com/engineering/multi-agent-research-system

[16] Parallel Tool Calling Research Time Reduction

Anthropic research demonstrating that parallel tool calling in multi-agent systems cuts research time by up to 90% for complex queries. Validates the token economics argument for sub-agent compression patterns.

https://www.anthropic.com/engineering/multi-agent-research-system

[17] Multi-Agent Systems for Breadth-First Queries

Anthropic internal evaluations showing multi-agent research systems excel especially for breadth-first queries that involve pursuing multiple independent directions simultaneously. Demonstrates when to use scatter-gather patterns.

https://www.anthropic.com/engineering/multi-agent-research-system

[18] Context Engineering: Why Building AI Agents Feels Like Programming on a VIC-20 Again

LeverageAI framework establishing the ephemeral sandbox pattern for sub-agents using the fork-exec pattern from Unix process isolation. Demonstrates how context isolation provides quality control by architecture rather than by hope.

https://leverageai.com.au/context-engineering-why-building-ai-agents-feels-like-programming-on-a-vic-20-again/

[19] DeepMind - AlphaGo & Monte Carlo Tree Search

AlphaGo demonstrated chess-style search algorithms combining neural networks with tree search mechanisms that explore promising moves deeply through iterative deepening and position evaluation methodology. Validates move-ordering heuristics as fundamental to efficient search through large decision spaces.

https://www.deepmind.com/research/highlighted-research/alphago

[20] Multi-Agent Research Systems Performance

Anthropic research showing multi-agent research systems excel for breadth-first queries that pursue multiple independent directions simultaneously. Parallel tool calling cuts research time by up to 90% for complex queries by enabling agents to explore different aspects in parallel.

https://www.anthropic.com/engineering/multi-agent-research-system

[21] Monte Carlo Tree Search (MCTS)

MCTS algorithm designed for problems with extremely large decision spaces like Go (10^170 possible states). Instead of exploring all moves, MCTS incrementally builds a search tree using random simulations to guide decisions, demonstrating iterative deepening principles.

https://www.geeksforgeeks.org/machine-learning/ml-monte-carlo-tree-search-mcts/

[22] LeverageAI: The AI Learning Flywheel

LeverageAI research demonstrating that 1% daily improvement applied to an improved baseline equals 3,778% better performance after one year through compounding returns. Framework validates the kernel flywheel mechanism where each cycle starts from a higher baseline than the last.

https://leverageai.com.au/wp-content/media/The_AI_Learning_Flywheel_ebook.html

[23] The Essence of Search is Compression

Anthropic research establishing the foundational principle that search fundamentally involves compression - distilling insights from a vast corpus. Subagents facilitate this compression by operating in parallel with their own context windows, exploring different aspects of questions simultaneously and enabling 10x compression ratios (200K → 20K tokens) with intelligence preserved.

https://www.anthropic.com/engineering/multi-agent-research-system

[24] Andrew Ng - Agentic Workflows

Andrew Ng's research demonstrating that GPT-3.5 with agentic architecture outperforms GPT-4 alone, proving that system design and architectural patterns matter more than raw model capability. Validates the thesis that architecture compounds while model upgrades provide only linear improvements.

https://www.insightpartners.com/ideas/andrew-ng-why-agentic-ai-is-the-smart-bet-for-most-enterprises/

[25] The Use of MMR, Diversity-Based Reranking for Reordering Documents

Carbonell & Goldstein's foundational 1998 research introducing Maximum Marginal Relevance (MMR) algorithm. Computes score balancing relevance and diversity, preventing redundancy while maintaining query relevance in search results. Formula: MMR = (1 − λ) × relevance_score - λ × max(similarity_with_selected_docs). Widely adopted for search result diversification, summarization, and recommendation systems.

https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf

[26] MIT Sloan Review: The End of Scale

MIT Sloan Management Review research establishing that AI enables economies of specificity replacing economies of scale. When cost of key inputs (cognition, information) drops dramatically, economic structure reorganizes around new abundance, making customization cheaper than standardization. Structural shift as fundamental as agrarian to industrial transition.

https://sloanreview.mit.edu/article/the-end-of-scale/

[27] MIT Sloan Review: Economies of Specificity Applied to Consulting

MIT Sloan analysis demonstrating how industrial-era economics favored standardization while AI-era economics enables perfect customization at lower cost than generic materials. When cognition is cheap and parallel, the economics flip from one-niche-one-template to individualized compilation.

https://sloanreview.mit.edu/article/the-end-of-scale/

[28] Consulting Success: AI Business Models and Best Practices

Consulting Success industry analysis showing AI reduces proposal development time by 70% while improving win rates. Traditional manual framework selection and proposal writing requires 2-3 consultant-days versus hours for AI-powered cognition supply chain approach, demonstrating dramatic efficiency gains in professional services.

https://www.consultingsuccess.com/consulting-business-models

[29] Qatalyst: AI-Powered Proposal Generator Case Study

Qatalyst case study demonstrating AI proposal system used to develop over 50 proposals across range of sectors. System significantly reduced revisions and demonstrated scalable infrastructure where marginal cost drops while quality increases after initial setup investment in routing indexes and briefing protocols.

https://qatalyst.ca/case-studies/study/proposal-generator

[30] Boutique Consulting Club: Win Rate Analysis

Boutique Consulting Club industry analysis showing custom, well-researched proposals achieve 60-90% win rate versus 20-30% for generic materials. Demonstrates that meta-credibility from bespoke proposal generation provides measurable competitive advantage in professional services market.

https://www.boutiqueconsultingclub.com/blog/win-rate

Primary Research

Effective Context Engineering for AI Agents

Attention budget principle: context as finite resource with diminishing marginal returns. LLMs have working memory capacity limits analogous to human cognition.

https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

On The Computational Complexity of Self-Attention (arXiv:2209.04881)

Self-attention scales quadratically with sequence length. Doubling context size quadruples computational load and attention diffusion.

https://arxiv.org/abs/2209.04881

Building Effective Agents

Multi-agent orchestration research showing 90.2% success rate vs 14–23% for single-agent approaches on SWE-bench benchmarks.

https://www.anthropic.com/research/building-effective-agents

Multi-Agent Research Systems

"The essence of search is compression: distilling insights from a vast corpus." Parallel tool calling cuts research time by up to 90% for complex queries. Sub-agents facilitate compression by operating with their own context windows.

https://www.anthropic.com/engineering/multi-agent-research-system

Information Foraging Theory

Information scent concept: agents follow cues that maximise value per unit of exploration effort. Foundation for routing index as navigation layer.

https://www.nngroup.com/articles/information-foraging/

Information Foraging: A Theory of How People Navigate on the Web

Peter Pirolli and Stuart Card's foundational theory explaining how agents follow "information scent" to maximize value per unit of exploration effort. Academic grounding for routing indexes as navigation layers.

https://www.nngroup.com/articles/information-foraging/

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval (arXiv:2401.18059)

Hierarchical retrieval with tree-structured summaries that retrieve across levels of abstraction, addressing chunking's global context loss.

https://arxiv.org/abs/2401.18059

Industry Analysis & Platforms

Tavily: AI-Powered Search for Developers

Two-step search-then-extract flow optimised for LLM ingestion. Demonstrates the dual-query pattern (search for candidates, extract for relevance) as a commercial product.

https://www.tavily.com/blog/tavily-101-ai-powered-search-for-developers

Claude Developer Platform: Web Search Tool

Agentic search loop where Claude decides when to search, repeating multiple times per request. Industry validation that retrieval is moving from single-shot to exploration.

https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool

LeverageAI / Scott Farrell

Practitioner frameworks and interpretive analysis developed through enterprise AI transformation consulting. These frameworks are presented as the author's voice throughout the ebook and listed here for readers who want to explore the underlying articles.

Worldview Recursive Compression

The kernel flywheel: compile domain expertise into frameworks that serve as reusable source code for AI interactions. The compounding loop described in Chapter 6.

https://leverageai.com.au/worldview-recursive-compression-how-to-better-encompass-your-worldview-with-ai/

Context Engineering: Why Building AI Agents Feels Like Programming on a VIC-20 Again

Treat LLM context like OS memory management. Tiered memory hierarchy, signal density over volume, sub-agents as ephemeral sandboxes. Foundation for the compression architecture in Chapter 5.

https://leverageai.com.au/context-engineering-why-building-ai-agents-feels-like-programming-on-a-vic-20-again/

Discovery Accelerators: The Path to AGI Through Visible Reasoning Systems

Chess-style search through idea space with move-ordering heuristics. Systematic exploration at ~100 nodes/minute. The exploration pattern referenced in Chapter 6.

https://leverageai.com.au/discovery-accelerators-the-path-to-agi-through-visible-reasoning-systems/

The Fast-Slow Split: Breaking the Real-Time AI Constraint

Separate the talker from the thinker. Exploration is the slow lane doing heavy cognition; interactive layers are the fast lane. Referenced in Chapter 4's hybrid architecture.

https://leverageai.com.au/the-fast-slow-split-breaking-the-real-time-ai-constraint/

Stop Picking a Niche: Send Bespoke Proposals Instead

Marketplace of One — AI inverts customisation economics so bespoke proposals are cheaper than generic sales materials. Applied in Chapter 7's proposal supply chain.

https://leverageai.com.au/stop-picking-a-niche-send-bespoke-proposals-instead/

Knowledge is a Tool: RAG for Agentic Systems

RAG fundamentals that the cognition supply chain builds upon. Single-document agentic RAG patterns.

https://leverageai.com.au/knowledge-is-a-tool-rag-for-agentic-systems/

Micro-Agents, Macro-Impact

Router/Supervisor/Worker architecture for composable AI agents. The micro-agent pattern underpins the sub-agent compression architecture.

https://leverageai.com.au/micro-agents-macro-impact-why-small-composable-ai-agents-beat-one-mega-brain/

The Three Ingredients Behind Unreasonably Good AI Results

Agency + Tools + Orchestration = compounding returns. The three requirements for the cognition supply chain to function.

https://leverageai.com.au/the-three-ingredients-behind-unreasonably-good-ai-results/

Breaking the 1-Hour Barrier: AI Agents That Build Understanding Over 10+ Hours

Stateless workers + stateful orchestration for extended AI sessions. The long-running agent patterns that enable deep supply chain cycles.

https://leverageai.com.au/breaking-the-1-hour-barrier-ai-agents-that-build-understanding-over-10-hours/

Progressive Resolution: The Diffusion Architecture for Complex Work

Coarse-to-fine resolution layers. The progressive refinement pattern that mirrors routing index → exploration → compression.

https://leverageai.com.au/progressive-resolution-the-diffusion-architecture-for-complex-work/

Note on Research Methodology

Research for this ebook was conducted using the cognition supply chain architecture it describes. A routing index of 30+ framework articles provided information scent for agentic RAG search. External sources (Anthropic engineering blog, academic papers, API documentation) were retrieved through dual-query patterns and judged against the ebook's thesis. Findings were compressed into a structured research artefact before chapter writing began.

All statistics and quotes are attributed to their original sources. Author frameworks are presented as interpretive analysis informed by enterprise AI consulting practice, not as independent research findings.

Compiled February 2026. Some links may require subscription access or may have been updated since compilation.

The CognitionSupply Chain

The Wrong Variable

The Model Intelligence Myth

Blue Whales vs Implement AI

The Blue Whales Test

Frameworks as Compression Codec

1. Routing

2. Compression

3. Consistency

Context Quality vs Context Quantity

The Cosmic Joke

Setting Up the Architecture

Key Takeaways

The Cognition Supply Chain

Naming the Architecture

The Supply Chain Metaphor Mapped

Manufacturing Supply Chain

Cognition Supply Chain

The Five Stages

Stage 1 — Route

Stage 2 — Explore

Stage 3 — Judge

Stage 4 — Compress

Stage 5 — Compound

The Retrieval Maturity Ladder

Level 1: System Prompt Stuffing

Level 2: Tool-Calling RAG

Level 3: Agentic RAG

Level 4: Agentic Exploration with Compression

Why Architecture Compounds

The Industry Is Already Moving This Way

Tavily

Claude Web Search

Anthropic Multi-Agent Research

The Map Ahead

Key Takeaways

Give the Agent a Map

Information Foraging Theory

The Routing Index Pattern

A Control Surface, Not Documentation

Guided Exploration

Backtracking by Design

Query Expansion

Worked Example: From Cold Start to Targeted Search

Before and After: The Routing Index Effect

Building Your Routing Index

From Map to Exploration

Key Takeaways

Explore, Don't Just Retrieve

RAG's Fundamental Limit

Agentic File Search — Exploration as Retrieval

Three-Phase Exploration

The Dual-Query Pattern

Retrieval Query

Goal Specification

Judge / Reranker

The Industry Is Converging on This Pattern

The Hybrid Architecture

RAG vs Exploration: Different Tools for Different Jobs

RAG (Fast Lane)

Exploration (Slow Lane)

From Exploration to Compression

Key Takeaways

Compress Through Sub-Agents

The Scatter-Gather Pattern for Cognition

Why Sub-Agents Work So Well

The Priming Problem — And How to Solve It

The Briefing Packet Schema

The Self-Priming Trick

Token Economics of Sub-Agent Compression

Token Economics: With and Without Sub-Agents

Practical Patterns for Sub-Agent Compression

Pattern 1: Research Agent

Pattern 2: Framework Application Agent

Pattern 3: Critique Agent

Pattern 4: Compiler Agent

From Compression to Compounding

Key Takeaways

The Compounding Loop

The Compounding Information Foraging Loop

The Cognition
Supply Chain