A LeverageAI Ebook
The Cognition
Supply Chain
From Search to Compounding Agentic Cognition
Scott Farrell
LeverageAI — 2026
The Wrong Variable
You upgraded to the frontier model. Your outputs are still generic. The model isn't the problem.
You've just upgraded to the frontier model. Spent weeks migrating. Your team is excited. The benchmarks are incredible — higher scores on every leaderboard, bigger context window, faster inference. You run your first real domain query.
The output is... still generic. Still vague. Still missing the nuances your domain demands. Still the kind of thing that sounds plausible to someone who doesn't know the subject, and frustrating to someone who does.
The natural conclusion: "We need better prompts." Or: "The model isn't smart enough yet — maybe the next release."
Both conclusions are wrong. You're optimising the wrong variable. The model isn't the bottleneck. The supply chain feeding it context is.
The Model Intelligence Myth
The default assumption driving the entire AI industry: better model = better output. Bigger context window = more knowledge. Higher benchmark scores = more capable. This is the mental model everyone runs on.
Model vendors market intelligence as THE differentiator. Upgrading feels like progress without architectural risk. Benchmarks reinforce the narrative — MMLU, HumanEval, GPQA — each new release inches the numbers upward, and the marketing machines spin it into inevitability.
But here's the uncomfortable truth: for domain-specific, unstable-domain work — the kind enterprises actually need — model capability matters far less than what the model gets to think with.1
Compressed context works amazingly well. Frameworks accelerate you from zero to 100 in one second — the model is up to speed in no time at all. Not because the model suddenly learned your whole worldview. But because the right context architecture gave it a high-bandwidth index into everything it needed.
The model didn't get smarter. It got a map.
Blue Whales vs Implement AI
Ask any model about blue whales. You'll get brilliant, detailed, nuanced answers. Migration patterns, population dynamics, the physics of baleen feeding, conservation timelines. Excellent stuff.
Now ask it how to implement AI in your organisation.
Fog. Plausible-sounding generalities. "Consider your use case." "Start with a pilot." "Ensure executive buy-in." The kind of advice that fills whitepapers but empties bank accounts. If you ask AI how to implement AI, it won't even scratch the surface. It'll be wrong, inconsistent. But ask AI about blue whales and you've got heaps of knowledge and background.
Why the difference? Blue whales are a stable domain: well-established facts, slow-moving consensus, clear signals, bounded disagreement. A language model does well here even with stale training data, because the underlying shape of the knowledge doesn't change daily.
"Implement AI" is an unstable domain: the technology changes quarterly, best practices are still forming, failure modes are socio-technical (politics, incentives, governance), and the question is massively under-specified. What industry? What risk appetite? What data maturity? What team structure?
The prompt is effectively: "Please search an unbounded space of possibilities and also guess which constraints I forgot to mention." Any navigator would struggle without a map, regardless of intelligence.
The Blue Whales Test
Ask your AI system a domain question. Then ask it about blue whales.
If the blue whales answer is dramatically better, you don't have a model problem. You have a context architecture problem. The model can reason brilliantly — it just doesn't have a map of YOUR territory.
This diagnostic takes 60 seconds and reveals whether upgrading your model or upgrading your context architecture will produce the bigger improvement.
Frameworks as Compression Codec
What solves the unstable-domain problem? Compressed frameworks that turn messy, high-entropy problems into named shapes the model can reason with.
Once you name the shapes — lane doctrine, augmentation vs replacement, fast-slow split, governance-first, economies of specificity — you've reduced the search space brutally. The model doesn't have to "invent wisdom." It navigates a map you already drew.
"Once you name the shapes, the search space collapses brutally. The model navigates a map — it doesn't invent wisdom."
Frameworks help AI systems in three specific ways:
1. Routing
"Which lane am I in?" Augment vs automate vs reinvent. Batch vs real-time. Internal vs customer-facing. Governed vs unmanaged. The framework tells the model WHERE to focus before it starts reasoning.
2. Compression
"What do I need to remember?" A few principles instead of a thousand anecdotes. The framework reduces a complex domain to its load-bearing abstractions.
3. Consistency
"What should I do next?" Repeatable choices, not vibes. The framework ensures the model makes the same high-quality decision regardless of how the question is phrased.
This is the "zero to 100 in one second" effect. Frameworks provide a high-bandwidth index into a worldview. The model didn't learn your whole worldview in one second — the frameworks provided a compression codec that makes the worldview navigable instantly.
Context Quality vs Context Quantity
The conventional wisdom: more context = better results. Bigger context window = bigger opportunity. RAG retrieves more chunks = more knowledge.
The reality: context is not just capacity — it's attention.
"Context must be treated as a finite resource with diminishing marginal returns. Like humans, who have limited working memory capacity, LLMs have an 'attention budget' that they draw on when parsing large volumes of context."2
Every irrelevant token in the context window actively degrades the model's focus and output quality. It's not neutral noise — it's attention theft.
The practical consequence: 50K tokens of pure signal outperforms 200K tokens of noise + signal.4 Smaller, cleaner context beats bloated context every time.
Raw context gives you lots of text at high cost with low signal. Compressed context gives you a small number of high-leverage abstractions at low cost with high signal. The race for bigger context windows misses the point entirely. It's like making a bigger warehouse instead of improving your supply chain logistics.
The Cosmic Joke
Here's the punchline that should change how you invest: once you have the right context architecture — the thinking OS — the exact model matters less than everyone thinks.
Bigger models help, sure. But structure is how you turn capability into outcomes. A well-architected supply chain feeding a "last-generation" model will consistently outperform a frontier model running on raw context stuffing.
The evidence: frameworks compiled 6+ months ago still produce expert-level outputs from models with 18-month-stale training data. The frameworks bridge the training gap. The routing index provides vocabulary the model never saw in training. The compression patterns keep the context clean. The architecture does the heavy lifting.
"The cheeky cosmic joke: once you have the thinking OS, the exact model matters less than everyone thinks."
What this means for investment: stop chasing model upgrades. Start building framework libraries and routing indexes. The competitive moat shifts from "which model" to "which supply chain."
The industry is spending billions upgrading models while users still stuff raw text into system prompts. The 10x improvement isn't in the model — it's in the architecture around the model.
Setting Up the Architecture
If model intelligence isn't the answer, what is? Architecture. Specifically: a supply chain for cognition.
The next chapter names the architecture and its five stages. The rest of this ebook builds each stage.
Key Takeaways
- • Model intelligence is overrated for domain-specific work — context quality is the binding constraint
- • The Blue Whales Test diagnoses whether your problem is model capability or context architecture
- • Frameworks act as compression codecs that collapse the search space for AI systems
- • Attention is finite — smaller, cleaner context outperforms larger, noisier context
- • The competitive moat is shifting from model choice to context architecture
The Cognition Supply Chain
Manufacturing supply chains transformed raw materials into finished goods. Cognition needs the same architecture.
In manufacturing, nobody expects brilliant products from throwing raw materials at the factory floor and hoping. There's a supply chain: sourcing, processing, quality control, assembly, delivery. Each stage adds value. Each stage has discipline. The whole thing compounds because yesterday's production teaches you how to produce better tomorrow.
Yet that's exactly what most AI implementations do: throw raw text — system prompts, basic RAG chunks, unstructured documents — at a frontier model and wonder why the output feels like it was written by a well-read stranger who knows nothing about your business.
We call this architecture The Cognition Supply Chain — the end-to-end pipeline from raw knowledge to client-ready, domain-specific AI output.
Naming the Architecture
The vocabulary shift is deliberate. "AI recommendations" sounds like magic that should just work. "Cognition supply chain" sounds like engineering that needs discipline. It imports 20+ years of supply chain management thinking — routing, QC, compression, just-in-time delivery, feedback loops — and applies it to knowledge.
"The vocabulary shift matters. 'AI recommendations' sounds like magic. 'Cognition supply chain' sounds like engineering that needs discipline."
The Supply Chain Metaphor Mapped
Manufacturing Supply Chain
- • Raw materials → sourced, graded, inventoried
- • Processing → shaped, refined, quality-checked
- • Assembly → combined into finished product
- • Delivery → shipped to the right customer
- • Feedback loop → production data improves next run
Cognition Supply Chain
- • Raw materials → domain knowledge, frameworks, institutional memory
- • Processing → agentic exploration, cross-document traversal
- • Quality control → dual-query judgment, LLM judges
- • Assembly → sub-agent compression, distillation
- • Feedback loop → each cycle improves routing and frameworks
Why "supply chain" and not "pipeline"? Pipelines are linear. Supply chains have routing decisions, parallel processing, quality gates, inventory management (context), and — critically — they compound. Each delivery teaches you how to source and process better next time.
The Five Stages
Stage 1 — Route
Give the agent a map before it searches.
A routing index provides information scent — vocabulary, topology, canonical terms. Without it, the agent wanders through embedding space hoping to stumble on relevance. With it, the agent navigates directly to high-value content using terms guaranteed to exist in the corpus.
Stage 2 — Explore
Send agents to investigate, not just retrieve.
Move beyond "find me the top 5 chunks" to "explore this corpus like a human researcher." Three-phase loop: scan, deep dive, backtrack. Cross-document dependencies become first-class, not an accident of chunk recall.
Stage 3 — Judge
Separate retrieval from reasoning.
The dual-query pattern: one query for broad retrieval (high recall), another for precise judgment (high precision). An LLM judge filters, refines, or rejects candidates against the real goal. This prevents domain keywords from polluting semantic search.
Stage 4 — Compress
Distill exploration into high-density signal.
Sub-agents explore in isolated contexts (200K tokens of thrashing). The main agent reads only the compiled memo (20K tokens of pure signal). Scatter-gather for cognition: parallel exploration, centralised compilation.
Stage 5 — Compound
Each cycle improves the next.
Better frameworks lead to better routing, which leads to better exploration, which leads to better compression, which leads to improved frameworks. This is a flywheel, not a pipeline. Architecture investments compound; model upgrades don't.
| Stage | Name | Job | Key Pattern |
|---|---|---|---|
| 1 | Route | Give the agent a map | Routing index with information scent |
| 2 | Explore | Investigate, don't just retrieve | Three-phase agentic search |
| 3 | Judge | Separate retrieval from reasoning | Dual-query pattern |
| 4 | Compress | Distill to high-density signal | Sub-agent scatter-gather |
| 5 | Compound | Each cycle improves the next | Kernel flywheel |
The Retrieval Maturity Ladder
Most organisations think they're advanced with AI because they've implemented RAG. Here's where they actually sit:
Level 1: System Prompt Stuffing
Pad 2,500 tokens of context into the system prompt. Cheap, fast, but brittle and stale. No search capability. Content goes stale immediately. Where most "we're using AI" organisations actually are.
Level 2: Tool-Calling RAG
LLM with tool-calling to RAG. The "old way." Single-shot: query, top-k chunks, synthesise. No iteration, no refinement, no gap detection. The model can't reformulate queries or spot missing context.
Level 3: Agentic RAG
Agent-driven RAG. The loop matters: the agent can reformulate queries, spot gaps, ask for more. But still limited to chunk similarity — can't follow cross-document references. Where many "advanced" implementations sit today.
Level 4: Agentic Exploration with Compression
Full exploration engine: scan, deep dive, backtrack with tool access. Sub-agent compression: 200K to 20K. Dual-query judgment: separate retrieval from evaluation. The cognition supply chain lives here.
Why Architecture Compounds
Model upgrades are one-time lifts. You upgrade from Model A to Model B. Output improves marginally. Then it plateaus until the next upgrade. No compounding.
Architecture compounds. Every framework you build, every routing entry you add, every compression pattern you refine — makes the NEXT interaction better. The system learns (through the human closing the loop), not just the model. Research from Andrew Ng demonstrates that GPT-3.5 with proper agentic architecture outperforms GPT-4 alone6 — proving architecture matters more than raw model capability.
Your competitor can buy the same model tomorrow. They can't copy your framework library, your routing index, your compression patterns, or your institutional knowledge encoded into the supply chain.
This connects to Worldview Recursive Compression5: compile your domain expertise into frameworks that serve as the "kernel" loaded into every AI interaction. The supply chain IS the kernel in action.
The Industry Is Already Moving This Way
The cognition supply chain isn't theoretical. Pieces of it are already shipping in production systems:
Tavily
Search, extract, LLM answer. Two-step retrieval with judgment built in. Their /search endpoint returns concise snippets optimised for LLM ingestion; /extract pulls full cleaned content; include_answer adds LLM synthesis. This is Stages 2 + 3 commercialised.7
Claude Web Search
Agentic loop — Claude decides when to search, the API runs searches, and this can repeat multiple times in one request. The model is exploring, not just retrieving. Stage 2 as a platform feature.8
Anthropic Multi-Agent Research
Multi-agent orchestration showing 90.2% success rate vs 14–23% for single-agent approaches.9 Parallel sub-agents with compression. Stages 4 + 5 validated at scale.
The industry is building the supply chain piece by piece. We're naming the whole architecture.
The Map Ahead
Part II builds each stage in detail:
- • Chapter 3: Routing — how to give the agent a map
- • Chapter 4: Exploration — how to search properly (not just retrieve)
- • Chapter 5: Compression — how to distill 200K tokens to 20K of signal
- • Chapter 6: Compounding — why each cycle makes the system smarter
Part III then shows the supply chain applied to real workflows: proposals and research.
Key Takeaways
- • The Cognition Supply Chain has five stages: Route → Explore → Judge → Compress → Compound
- • Most organisations are at retrieval maturity Level 2, thinking they're at Level 3
- • Architecture compounds; model upgrades don't
- • The industry is converging on this pattern — Tavily, Claude, Anthropic research all validate pieces of it
Give the Agent a Map
When an agent starts cold against a corpus, it has two problems: it doesn't know what words exist in your world, and it doesn't know what matters.
When an agent starts against your corpus for the first time, it faces two problems simultaneously. The vocabulary problem: it doesn't know what terms exist in your world. Your organisation has named concepts, acronyms, frameworks, and domain-specific language the model has never seen in training. The topology problem: it doesn't know what's important versus peripheral, foundational versus derivative, central versus edge-case.
Without solving both, the agent performs expensive random walks through embedding space — burning tokens, returning noise, and sometimes missing the exact content that would have answered the question perfectly.
The fix: give the agent a geography map before it starts searching.
Information Foraging Theory
Information Foraging Theory10, from cognitive science and human-computer interaction research, explains how agents — both human and AI — navigate knowledge spaces. The core concept is information scent: cues that help an agent estimate "is it worth going down this path?" before paying the cost of opening, reading, and reasoning.
Good websites beat bad ones for the same reason: people don't "search" — they forage. The better the scent (clear labels, meaningful categories), the fewer wrong turns. A well-designed navigation menu lets you estimate "will I find what I need down this path?" before clicking.
Applied to AI agents: a routing index provides information scent for the corpus. The agent can estimate value before full exploration — just like a user scanning a well-designed navigation menu.
Without information scent: The agent's only vocabulary comes from your problem statement. Those words might not match anything in your knowledge base. The agent searches with the wrong terms, retrieves irrelevant chunks, and produces generic output.
With information scent: The agent reads the map, discovers canonical terms that ARE in the corpus, and immediately searches with terms guaranteed to hit.
The Routing Index Pattern
A routing index is a machine-readable map of your knowledge corpus. It's NOT documentation. Not a README. Not a wiki. It's a control surface for agent navigation.
"Instead of starting the search empty, you're giving the LLM a geography map."
The structure of each entry:
- Importance: Prevents high-failure AI deployments by scoring projects for structural fit
- Retrieve when: evaluating or prioritising AI initiatives, deciding between automation approaches
- Tags: deployment, project selection, batch processing, governance, risk
- Related: Simplicity Inversion, Enterprise AI Spectrum, Fast-Slow Split
- URL: /frameworks/lane-doctrine.md
The crucial insight: every term in the routing index is guaranteed to exist in the corpus. This turns semantic search from "hopefully something matches" to "I know these words will hit." The vocabulary problem and the topology problem are solved simultaneously.
A Control Surface, Not Documentation
The routing index doesn't EXPLAIN the frameworks. It provides just enough scent for the agent to decide whether to look deeper. One-liner descriptions, not full explanations. Tags for cross-referencing, not comprehensive summaries. "Retrieve when" triggers, not usage guides.
Three capabilities a routing index enables:
Guided Exploration
Start at high-level nodes, drill down only when needed. The agent doesn't read everything — it reads the map, then reads what matters.
Backtracking by Design
If the chosen branch isn't yielding evidence, the agent can deliberately shift to an adjacent framework rather than flailing in embedding space. The "Related" field makes this deliberate.
Query Expansion
The index provides canonical terms the agent can use to formulate better queries — without polluting retrieval with client-specific noise. It's a query expansion oracle with zero false matches.
There's academic work validating this approach. RAPTOR11 builds tree-structured summaries for hierarchical retrieval — retrieving across levels of abstraction. A routing index is a lightweight, human-curated version of that idea: both high-level map (global context) and drill-down capability (local detail).
"Regardless of your retrieval strategy — RAG, agentic search, whatever — the routing index is super powerful."
Worked Example: From Cold Start to Targeted Search
Before and After: The Routing Index Effect
Before: Cold Start
- Problem: "Our manufacturing client wants to implement AI. Where should they start?"
- Agent vocabulary: "manufacturing", "AI", "implement", "start"
- RAG returns: Chunks about manufacturing processes, general AI introductions, implementation timelines
- Result: Generic advice that could have come from any blog post
The agent searched with the only words it had — problem terms that don't match corpus terms.
After: With Routing Index
- Agent reads index. Sees: Lane Doctrine ("when evaluating where to deploy AI"), Simplicity Inversion ("when choosing starting point"), Enterprise AI Spectrum ("when matching complexity to readiness")
- Agent searches with: "lane doctrine manufacturing", "simplicity inversion starting lane", "readiness diagnostic"
- RAG returns: Specific framework content about scoring deployment lanes, starting with internal batch processes, matching autonomy to governance
- Result: Specific, framework-grounded recommendations tailored to manufacturing
Same model, same corpus, same RAG. The only change: the agent got a map first.
| Cold Start | With Routing Index | |
|---|---|---|
| Query vocabulary | Problem terms only | Problem terms + canonical framework terms |
| Search precision | Low (generic matches) | High (guaranteed corpus hits) |
| Cross-references | Accidental | Deliberate (related concepts in index) |
| Backtracking | Random | Structured (adjacent entries) |
| Time to relevant content | Minutes of wandering | Seconds of navigation |
Building Your Routing Index
Start small. 10–15 entries. Don't try to index everything. Start with your most-used frameworks, most-referenced documents, most-important concepts.
Iteration pattern: After each significant AI interaction, ask: "Was there a concept the agent should have known about but didn't?" If yes, add a routing entry. The index grows organically. Every routing entry you add improves the next interaction. The index compounds.
Time investment: 1–2 hours for the initial 10–15 entries. 5 minutes per new entry thereafter.
From Map to Exploration
The routing index is Stage 1 of the cognition supply chain. It solves the cold-start problem by providing information scent — vocabulary and topology — before the agent searches. With a map in hand, the agent can explore properly. The next chapter shows how exploration differs from simple retrieval — and why the dual-query pattern prevents retrieval from being polluted by reasoning needs.
Key Takeaways
- • Agents face two cold-start problems: vocabulary (what terms exist?) and topology (what's important?)
- • Routing indexes provide information scent — the agent estimates value before paying exploration costs
- • A routing index is a control surface, not documentation: name, importance, "retrieve when" trigger, tags, related concepts, URL
- • Works regardless of retrieval strategy (RAG, agentic, hybrid)
- • Start with 10–15 entries and iterate — the index compounds with use
Explore, Don't Just Retrieve
Keyword search returns 10 thumbnails for "kookaburra." An LLM judge picks the one showing a zoomed-in kookaburra in a gum tree. Same pattern, generalised to text.
Here's a concrete example of the problem — and the fix. An image search system using a public library. Keyword search for "kookaburra" returns 10 thumbnails. Some are close-ups, some distant shots, some on fences, in flight, on power lines. The search found "kookaburra" — broad recall, job done.
But what was actually wanted: a zoomed-in kookaburra sitting in a gum tree.
The fix: send a keyword search to get candidates. Send a SEPARATE rich context question to an LLM judge: "Which of these shows a zoomed-in kookaburra in a gum tree?" The search finds candidates. The judge picks the right one.
This same pattern — broad retrieval + precise judgment — generalises to text retrieval. And it's the key to Stages 2 and 3 of the cognition supply chain.
RAG's Fundamental Limit
Standard RAG: query, vector similarity, top-k chunks, synthesis. Find the chunks most semantically similar to your query. Good enough for simple, local questions: "Find the paragraph about X."
What standard RAG can't do:
- × Cross-document dependencies: "X is defined in doc A, constrained by doc B, and exceptioned by doc C." Chunk similarity doesn't represent these relationships.
- × Global context: Chunking loses the big picture.11 Each chunk is an island, divorced from its position in the larger argument.
- × Dependency-shaped knowledge: Most real enterprise knowledge IS dependency-shaped — policies reference contracts, specs depend on other specs, understanding requires traversal.
Chunking loses context. Semantic similarity looks at a few chunks similar to your query and completely ignores cross-references.12 If your RAG works brilliantly for "find me a quote about X" but fails for "how does policy X interact with exception Y from contract Z" — you're hitting RAG's architectural ceiling, not a model limitation.
Agentic File Search — Exploration as Retrieval
"Stop pretending retrieval is a single query-time lookup. Make it an agentic investigation."
Coding agents — Claude Code, Cursor, Copilot — don't rely on semantic similarity alone. They skim, search, jump to definitions, follow imports, "open the next file because that file points to it."13 The insight: generalise this pattern from code to ALL documents.
Three-Phase Exploration
Phase 1: Parallel Scan
Preview all docs quickly. LLM identifies which potentially contain relevant information by reading starts/headers/summaries.
Phase 2: Deep Dive
Fully parse and read only the promising docs. LLM identifies missed cross-references during reading.
Phase 3: Backtrack
Follow cross-references to docs missed in the initial scan. The agent picks up what it now knows it needs.
This three-phase exploration pattern — scan, deep dive, backtrack — turns retrieval from a single lookup into an investigation. The tools are simple: scan folder, preview, parse, read, regex search, glob. It's "Claude Code energy" for document corpora, not just code.
The key difference from RAG: cross-document dependencies become first-class. The agent FOLLOWS references rather than hoping chunk similarity catches them accidentally.
The Dual-Query Pattern
Here's the core insight most retrieval systems miss: retrieval and reasoning want different inputs. Conflating them — which is what everyone does — guarantees mediocre results.
Consider the manufacturing client problem. Your semantic query: "problems with AI implementation." What you really need: "manufacturing client, early-stage, struggling to find where to start." If you put "manufacturing" into the semantic search, you'll get content about manufacturing PROCESSES — completely off-target. The manufacturing context belongs in the judgment phase, not the retrieval phase.
The pattern separates these concerns:
Retrieval Query
Broad, recall-heavy. "AI implementation failure modes, where to start." Don't miss anything. Optimised for coverage.
Goal Specification
Rich, intention-heavy. "Manufacturing client, early-stage, needs safe starting lane, risk-averse board." Full context of what you actually need. Optimised for precision.
Judge / Reranker
LLM examines retrieved candidates against the goal spec. Selects, refines, or rejects. You get broad recall AND precise selection.
"AI implementation failure modes, starting points, common mistakes"
[Broad, keyword-focused, optimised for recall]
GOAL SPECIFICATION (sent to LLM judge):
"We have a manufacturing client (500 employees, early-stage AI,
risk-averse board). They need to identify their safest starting
lane. Focus on deployment patterns that minimise governance burden
and maximise early wins."
[Rich, intention-focused, optimised for precision]
JUDGE INSTRUCTION:
"Review the retrieved candidates against the goal specification.
Select the 3 most relevant. Explain why each was selected.
Note any gaps the retrieval didn't cover."
The kookaburra proof shows this works in production: search query = "kookaburra" (broad). Goal spec = "zoomed-in kookaburra in a gum tree" (specific). Judge = LLM reviews thumbnails against goal spec, picks the best match. Same pattern, proven and generalised.
The Industry Is Converging on This Pattern
This isn't theoretical. The search-then-judge pattern is already shipping:
Tavily implements search, extract, then LLM answer.7 Their /search endpoint returns concise snippets optimised for LLM ingestion. /extract pulls full cleaned content. include_answer adds LLM synthesis. This IS the dual-query pattern commercialised: search (broad) then extract (relevant) then answer (judged).
Claude web search operates as an agentic loop — Claude decides when to search, the API runs searches, and this can repeat multiple times in one request.8 The model is exploring, not just retrieving.
LlamaIndex file-explorer agents match RAG quality for complex queries but with higher latency.14 Better for background/async tasks while RAG wins for real-time. The discriminant variable: latency tolerance vs depth requirements. This maps directly to the Fast-Slow Split: exploration is the slow lane doing heavy cognition; interaction is the fast lane.
The Hybrid Architecture
You don't choose RAG OR exploration. You use both.
RAG vs Exploration: Different Tools for Different Jobs
RAG (Fast Lane)
- • Low-latency, single-document queries
- • "Find me the section about..."
- • Broad candidate generation
- • The broad net
Exploration (Slow Lane)
- • Dependency-shaped, cross-document queries
- • "How does X interact with Y given Z?"
- • Deep truth-finding with audit trail
- • The scalpel
Integration pattern: Use RAG results as starting points for exploration. RAG generates candidates; exploration agents follow the most promising ones, reading full documents, following cross-references, building a complete picture.
From Exploration to Compression
The dual-query pattern and agentic exploration are Stages 2 and 3 of the cognition supply chain. But exploration generates volume — 200K tokens of thrashing, contradictions, and dead ends. The next chapter shows how to compress this into the 20K tokens of signal the main agent actually needs.
Key Takeaways
- • RAG fails for dependency-shaped knowledge — cross-document, cross-reference questions
- • Agentic exploration generalises coding-agent patterns to document corpora: scan → deep dive → backtrack
- • The dual-query pattern separates retrieval (broad, recall-heavy) from judgment (rich, intention-heavy)
- • The industry is already converging: Tavily, Claude web search, LlamaIndex file explorers
- • Hybrid is the right answer: RAG for speed, exploration for depth
Compress Through Sub-Agents
A sub-agent spends 200,000 tokens on research. The outcome is 20,000 tokens of applicable findings. The 200,000 vanish — but the 20,000 that remain are pure signal.
A sub-agent might spend 200,000 tokens on its work — the original briefing, the research, the dead ends, the contradictions, the writing and rewriting. But its outcome is 20,000 tokens of applicable research. You've taken the framework roadmap, expanded it for this particular client, then compressed it back.
200,000 tokens vanish. But the 20,000 that remain are pure signal. This is Stage 4 of the cognition supply chain: the compression step that turns messy exploration into high-density, decision-ready output.
The Scatter-Gather Pattern for Cognition
The architecture is clean. The main agent is the orchestrator: it holds the map, the client goal, evaluation criteria, and "don't be stupid" constraints. It preserves its context window for synthesis and judgment — the expensive work.
Sub-agents are the explorers. Each starts with a clean slate. Goes deep on a specific angle. Less biased by the main agent's current narrative because they don't share context.
The output is distilled artefacts: 20K tokens of "useful truth" instead of 200K tokens of wandering.
Why does context isolation matter so much? Sub-agents can thrash around, contradict themselves, chase weird leads — and the main agent never sees the mess. The main agent only accepts the compiled memo. It's quality control by architecture, not by hope. Messy intermediate reasoning never pollutes the parent context.
The economics: burn cheap tokens on exploration (sub-agent private contexts are disposable). Preserve expensive main-context tokens for synthesis and judgment (the orchestrator's context is precious real estate).
"200,000 tokens vanish, but the 20,000 that remain are pure signal."
Why Sub-Agents Work So Well
Anthropic's own research validates this pattern directly, showing multi-agent orchestration achieves 90.2% success rate versus 14-23% for single-agent approaches on software engineering benchmarks:9
"The essence of search is compression: distilling insights from a vast corpus. Subagents facilitate compression by operating in parallel with their own context windows, exploring different aspects of the question simultaneously."15
Parallel tool calling cuts research time by up to 90% for complex queries.16 Multi-agent research systems excel especially for breadth-first queries with multiple independent directions.17
Sub-agents operate as ephemeral sandboxes — spin up with a narrow brief, explore deeply, emit a clean artifact, terminate. This is Context Engineering's fork-exec pattern: the Unix model of process isolation applied to cognition.18
The Priming Problem — And How to Solve It
Sub-agents start empty. They don't share the main agent's context. You've got to prime them about what they're trying to do, which is a real hassle.
The mistake most people make: priming like they're telling a story. Long, narrative descriptions of the context, the background, the history, the nuances. This wastes tokens, it's imprecise, and the sub-agent may misunderstand the mission entirely.
The fix: stop priming like a story. Start priming like a function call. A briefing packet with a stable schema.
The Briefing Packet Schema
| Field | Purpose | Example |
|---|---|---|
| Mission | What decision/outcome are we enabling? | "Identify which 3 of our 30 frameworks best apply to this manufacturing client" |
| Non-goals | What NOT to spend tokens on | "Don't evaluate financial ROI. Don't compare vendors." |
| Constraints | Governance, risk, tools allowed | "Australian context only. Use RAG search, not web." |
| Client context | The 10 facts that matter (not 100) | "500 employees, manufacturing, early-stage AI, risk-averse board" |
| Framework anchors | Relevant named frameworks + one-liners | "Lane Doctrine (deploy where physics is on your side), Simplicity Inversion (start with 'complex' internal tools)" |
| Deliverable format | Exactly what to return | "Return: (1) findings, (2) implications, (3) recommended moves, (4) citations, (5) open questions" |
Schema > narrative: the sub-agent knows exactly what job it has, what to ignore, and what format to return. No ambiguity. No wasted tokens on understanding intent.
The Self-Priming Trick
Sometimes the sub-agent misunderstands the mission. It goes deep on the wrong angle, burns 200K tokens, and returns irrelevant findings. Expensive failure.
The fix: force the sub-agent to first produce a tiny "I understand the job" recap in your framework language before it starts researching.
Two benefits: it catches misunderstanding early (if the recap is wrong, you can correct before burning exploration tokens), and it aligns subsequent reasoning (the sub-agent is now reasoning in your framework handles, making its later work more coherent).
Token Economics of Sub-Agent Compression
The numbers from production workflows:
Token Economics: With and Without Sub-Agents
| Exploration | Compilation | What Main Agent Sees | |
|---|---|---|---|
| 1 sub-agent | 200K tokens | 20K output | 20K pure signal |
| 5 parallel sub-agents | 1M tokens total | 100K combined output | 100K curated signal |
| Without sub-agents | 1M tokens in main context | N/A | 1M noise + signal mixed |
The sub-agent pattern doesn't just save tokens — it saves attention. The main agent's context stays clean.
The main agent's context window is precious — every token affects attention quality. Sub-agents burn cheap tokens (private context, disposable). The main agent reads expensive tokens (curated, high-signal). It's like hiring 10 research assistants who each write you a one-page memo. You read 10 pages, not 1,000.
Practical Patterns for Sub-Agent Compression
Pattern 1: Research Agent
Receives briefing packet + access to RAG/search. Explores a specific question. Returns: findings + evidence + citations + open questions.
Pattern 2: Framework Application Agent
Receives one framework + client context. Applies the framework to the client. Returns: applicability assessment + specific recommendations + caveats.
Pattern 3: Critique Agent
Receives a draft output + evaluation criteria. Reviews for gaps, contradictions, unsupported claims. Returns: issues found + severity + suggested fixes.
Pattern 4: Compiler Agent
Receives multiple sub-agent outputs. Merges, resolves conflicts, eliminates duplications, synthesises into coherent narrative. Returns: unified findings with conflict resolution notes.
From Compression to Compounding
The supply chain now routes, explores, judges, and compresses. But the real power emerges in Stage 5: when each cycle feeds the next. The compounding loop is what turns a pipeline into a flywheel.
Key Takeaways
- • Main agent = orchestrator (preserve context for synthesis). Sub-agents = explorers (burn tokens in isolation).
- • Context isolation is quality control by architecture — messy reasoning never pollutes the parent
- • Prime sub-agents with a briefing packet (schema), not a narrative. Mission, Non-goals, Constraints, Client context, Framework anchors, Deliverable format.
- • Self-priming trick: force a 200-token recap before 200K tokens of exploration
- • The 10:1 compression ratio: 200K tokens vanish, 20K of signal remain
The Compounding Loop
Chess move-ordering — opening books bias search toward high-value lines first. Same principle applied to cognition.
In chess programming, opening books let the engine play known-good moves instantly — zero search cost. Move-ordering heuristics bias the search toward the most promising branches first.19 Iterative deepening searches shallow, finds candidates, then deepens the best lines under a time budget.
This is exactly what the cognition supply chain does — but for knowledge work. Frameworks are the opening book. The routing index is move ordering. Agentic exploration is iterative deepening. And after each game, the engine learns which openings worked.
The difference between a pipeline and a flywheel: a pipeline processes inputs into outputs. A flywheel processes inputs into outputs AND improves itself. Stage 5 of the supply chain is what makes it compound.
The Compounding Information Foraging Loop
When agents can retrieve, each read changes the next question. The agent starts with a broad query informed by the routing index (Chapter 3). Retrieval returns candidates. The agent reads them. Reading changes the agent's understanding. The next query is sharper. Sharper query → more relevant retrieval → deeper understanding → even sharper query.20
This is information foraging with feedback — a compounding loop where retrieval improves reasoning AND reasoning improves retrieval.
The agent doesn't follow a predetermined sequence. It walks its own path through the frameworks based on what it thinks will help solve the current issue. Give a thin overview of frameworks plus a mechanism to search deeper. The agent sees the one-liners, gets the lay of the land, then digs into what's relevant. It's not a script — it's directed foraging.
This is iterative deepening in action: start with a thin map (routing index), pick a promising branch (highest-scent entry), expand it (deep search), re-score the landscape, expand again. Each cycle is informed by everything learned so far. The thin map becomes progressively richer without ever being loaded in full.
Frameworks as Move-Ordering Heuristics
The chess analogy isn't decorative — it's structurally accurate. Every element of the cognition supply chain maps to a chess search mechanism.
The Chess-Cognition Parallel
Chess Engine
- • Opening books — known-good positions, zero search cost
- • Move-ordering heuristics — explore best candidates first
- • Iterative deepening — search shallow, then deepen best lines
- • Evaluation function — score positions against criteria
- • Transposition table — cache explored positions
Cognition Supply Chain
- • Routing index — canonical terms, instant navigation
- • Frameworks — bias search toward high-value paths
- • Agentic exploration — scan, deep dive, backtrack
- • Dual-query judgment — score against goal specification
- • Documented dead ends — don't re-explore what failed
Frameworks act as move-ordering heuristics: they bias the search toward high-value lines first. The Lane Doctrine says "don't start in the hardest lane." The Fast-Slow Split says "separate the talker from the thinker." Governance-first says "build the safety infrastructure before deploying." These aren't suggestions — they're search-space pruning. They tell the agent which branches to explore FIRST and which to skip.
The routing index functions as an opening book: fast access to known-good structures and sharp definitions so the system doesn't waste time reinventing concepts already solved. The agent plays the opening instantly, then starts thinking deeply once it's past known territory.
Agentic exploration is iterative deepening: search shallow first to find candidate lines, then deepen the best ones under a token budget.21 Just as a chess engine won't spend 10 minutes analysing a line it scored poorly in the first pass, the cognition supply chain won't burn 200K tokens exploring a path the routing index flagged as low-value.
The dual-query pattern (Chapter 4) serves as the evaluation function: score candidates against the goal specification, not just by pattern matching. The manufacturing context stays in the evaluation, not the search query — just as a chess evaluation function considers piece position without rerunning the move generator.
Why move ordering matters: without it, the agent explores branches in random order, burning tokens on low-value paths. With framework-biased move ordering, high-value paths are explored first. This is chess-style search through idea space, where frameworks act as move-ordering heuristics — systematic exploration that prunes the tree before wasting compute on dead branches.
The Meta-Loop
The five stages aren't a linear pipeline. They're a flywheel where each revolution starts from a higher baseline than the last.
The Kernel Flywheel
Each cycle improves the next.22 Every framework you build today solves problems you haven't encountered yet. Every routing entry you add makes the next search more precise. Every sub-agent output you compress teaches you what's worth compressing.
This connects directly to the principle of compiling domain expertise into frameworks that serve as the "kernel" loaded into every AI interaction. The kernel flywheel IS the compounding loop: extract patterns → compress into frameworks → prefetch solutions → apply to new contexts → learn from application → improve kernel. The supply chain isn't just using the kernel — it's continuously refining it.
When Compounding Stalls — And How to Fix It
The compounding effect won't stay exponential forever. It hits diminishing returns unless you add two things:
1. A Scoring Function
What counts as "progress"? Risk reduced. Uncertainty collapsed. Decision narrowed. Value unlocked.
Without measuring, you can't tell if you're compounding or just spinning. The scoring function turns subjective "this feels better" into objective "we resolved 3 open questions and identified 2 new risks."
2. A Compression Step After Each Deepening
Convert what you learned into a tighter map. New rule. New heuristic. New "don't do this" constraint. New routing entry.
If you explore but don't compress the learning back into the system, the next cycle doesn't benefit. You explored, but you didn't compound.
The End-to-End Pipeline Pattern
Here's the complete architecture — all five stages together, showing how data flows through the supply chain and how each cycle feeds the next:
The Complete Cognition Supply Chain
1. Router Index (The Map)
Agent reads routing index → discovers canonical terms → selects exploration starting points. The map provides information scent (Chapter 3) so the agent navigates directly to high-value content.
2. Exploration Agents (Cheap, Parallel)
Multiple sub-agents scan → deep dive → backtrack across the corpus. Each starts with a briefing packet (Chapter 5). Each follows cross-document dependencies that RAG would miss.
3. Judge (Dual-Query Reranker)
Each exploration output evaluated against the goal specification (Chapter 4). LLM judge selects, refines, or rejects. Manufacturing keywords stay in the goal spec, not the search query.
4. Compiler Agent (Merge, Resolve, Compress)
Takes judged outputs from multiple explorers. Merges findings, resolves conflicts, eliminates duplication, compresses into high-density output. 200K tokens of exploration → 20K tokens of signal.23
5. Main Agent Reads Result (Synthesis)
The orchestrator reads 20K–100K tokens of compiled, judged, compressed signal. Synthesises into the final deliverable. Context stays clean. Attention stays focused.
6. → Next Cycle Starts Smarter
New routing entries from what was discovered. Refined frameworks from what was learned. Documented dead ends from what failed. The supply chain improves for next time.
This is the full cognition supply chain: from raw knowledge → through routing, exploration, judgment, compression → to client-ready, domain-specific output → with compounding improvement each cycle. Each stage has been built in Part II: routing (Chapter 3), exploration and judgment (Chapter 4), compression (Chapter 5), and now the compounding loop that ties them into a flywheel.
From Chat Toy to Operating System
What starts as "better retrieval" becomes something more fundamental. Track the trajectory:
"The system becomes less like a chat toy and more like an operating system for decision-making."
The system doesn't just answer questions. It accumulates intelligence. It builds institutional memory through its framework library and routing index. It improves with use. Every interaction either teaches it a new route or sharpens an existing one.
And this creates a genuine competitive moat. Your competitor can buy the same model tomorrow. They can subscribe to the same tools. They can even hire the same developers. What they can't copy: your framework library, your routing index, your compression patterns, your institutional knowledge encoded into the supply chain. Those took cycles to build. Each cycle compounded the last.
Architecture compounds. Models don't.24 That's the thesis of this entire ebook, proven stage by stage through Part II.
From Theory to Practice
Part II has built the full pipeline: route (Chapter 3), explore and judge (Chapter 4), compress (Chapter 5), compound (Chapter 6). Part III shows this same architecture applied to two real-world domains — proposals and research — then provides a practical implementation guide for building your own supply chain, stage by stage.
Key Takeaways
- • Compounding happens when retrieval improves reasoning AND reasoning improves retrieval
- • Frameworks are move-ordering heuristics that bias agent search toward high-value paths first
- • The meta-loop: time → cognition → artefacts → compression → better priors → better search
- • Compounding stalls without two additions: a scoring function and a compression step after each cycle
- • The complete pipeline: Router → Exploration agents → Judge → Compiler → Main agent → Next cycle starts smarter
- • Architecture creates a moat competitors can't copy by upgrading their model
The Proposal Supply Chain
30+ frameworks applied to a client. Only 6 make the proposal. The supply chain decides which 6 — and why.
You have 30+ proprietary frameworks. A new client just landed. They're in manufacturing, 500 employees, early-stage AI, risk-averse board. You need a bespoke proposal that applies exactly the right frameworks to exactly their situation.
The old way: read every framework manually. Decide which ones apply. Write the proposal from scratch. Takes days. And by framework #15, you've forgotten the nuances of framework #3.
The supply chain way: route the client context through 30+ frameworks using sub-agents. Judge which 6 are most valuable AND most differentiated. Compress into a proposal that reads like you've been studying their business for weeks. Hours, not days.
Mapping the Supply Chain to Proposals
The proposal workflow is the cognition supply chain (Chapter 2) applied to customisation at scale. Each stage maps directly:
Stage 1 — Route
The routing index identifies which frameworks MIGHT apply to this client. Client tags — "manufacturing", "500 employees", "early-stage AI", "risk-averse board" — are matched against each framework's "retrieve when" triggers.
Result: 15–20 frameworks flagged as potentially relevant (broad recall). The Lane Doctrine, Simplicity Inversion, Enterprise AI Spectrum, Three-Lens Framework, and a dozen others pass the initial filter.
Stage 2 — Explore
Sub-agents apply each framework to the client's specific context. Each receives one framework + client context via a briefing packet (Chapter 5). Each produces an applicability assessment, specific recommendations, caveats, and examples relevant to manufacturing.
Scale: 15–20 parallel explorations × ~10K tokens each = ~150K–200K tokens of exploration. Each sub-agent works in isolation — no attention fatigue, no cross-contamination between frameworks.
Stage 3 — Judge
Selection with diversity control. Not "pick the 6 best" (haphazard) — but sequential selection that ensures each framework adds unique value. The goal spec for the judge: "Which framework provides the most unique value for THIS client that the other selected frameworks don't already cover?"
Method: Pick the best one. Then the next best one that's different enough. Repeat until 6 — or until no more genuinely distinct value exists.
Stage 4 — Compress
15–20 framework analyses compressed into 6 selected proposal sections. Each section: framework name, why it applies to this client, specific recommendations, expected outcomes.
Output: ~20K tokens of proposal content from ~200K tokens of exploration. The 180K tokens of rejected frameworks aren't wasted — they've been evaluated and documented as "considered but not selected."
Stage 5 — Compound
Each proposal improves the framework library. Which frameworks consistently get selected for manufacturing clients? (Strengthen routing.) Which are never selected? (Refine or retire.) What client-specific patterns emerged? (New routing entries.) What gaps appeared? (New frameworks needed.)
Effect: The fifth proposal costs a fraction of the first. The supply chain infrastructure is reusable — only the client context changes.
The Selection Trick — Pick One at a Time
Tell a model to "pick 6 frameworks" and it selects haphazardly — often picking frameworks that overlap heavily, missing important diversity. The output feels redundant. Three of the six say variations of the same thing.
The fix: sequential selection with diversity constraints.
Sequential Selection Protocol
Round 1
"Pick the single most valuable framework for this manufacturing client." → Lane Doctrine (deploys where physics is on your side — critical for risk-averse boards).
Round 2
"Given Lane Doctrine is selected, pick the next most valuable that adds something Lane Doctrine doesn't cover." → Three-Lens Framework (stakeholder alignment — different concern from deployment lane).
Round 3
"Given Lane Doctrine + Three-Lens, pick the next..." → Simplicity Inversion (where to start — neither deployment lanes nor stakeholders).
Rounds 4–6
Continue until 6 selected — or until the model can't find another genuinely distinct, useful framework. Reaching 5 instead of 6 is signal, not failure.
This is Max Marginal Relevance from information retrieval25: novelty-weighted selection that prevents near-duplicates. The same principle used in search result diversification — applied to framework selection.
| Batch ("Pick 6") | Sequential ("Pick 1, then next 1") | |
|---|---|---|
| Diversity | Poor — often selects overlapping frameworks | High — each selection explicitly diversifies |
| Quality | Moderate — compromises across all 6 | High — each pick optimised independently |
| Exception handling | None — always returns 6 | Natural — stops when no more distinct value |
| AI reliability | Models struggle with multi-criteria bulk selection | Models excel at single-choice comparison |
Meta-Credibility — The Proposal IS the Proof
"The proposal IS the proof. The way you sold them is the way you'll serve them."
The proposal itself demonstrates the supply chain in action. The client reads a bespoke, framework-grounded, deeply researched document and the implicit message is unmistakable: "If they can produce this for me before we've even signed, imagine what they can build FOR my business."30
You didn't produce a generic slide deck. You routed their specific context through proprietary frameworks, explored each one's applicability, judged which combination provides maximum unique value, and compressed it into a coherent narrative. The method IS the message.
This is the principle of economies of specificity27 applied to consulting. Industrial-era economics favoured standardisation — one niche, one offer, one template. When cognition is cheap and parallel, the economics flip: customising perfectly for each client costs less than maintaining generic materials.
The frameworks are source code. The proposal is the compiled binary. Fix the source once, and all future compilations improve. Each proposal is a regenerable artefact — not a one-off document crafted through heroic effort.
Token Economics of Proposal Generation
The production numbers: 30 frameworks × sub-agent application = ~300K exploration tokens. Selection and compression produces ~20K tokens of proposal content. Total API cost: roughly $3–5 for a proposal that reads like weeks of research.
The compounding economics are even more compelling. The first proposal is the most expensive — building the routing index, refining the briefing packets, tuning the selection protocol. By the fifth proposal, the infrastructure is proven.29 Only the client context changes. The marginal cost drops while the quality increases.
And the quality advantage compounds too. Each framework was applied by a fresh sub-agent with no attention fatigue. The judge reviewed each candidate against the specific client goal — no "good enough" compromises. The compiler resolved conflicts and ensured coherence. Human judgment is preserved for the FINAL review, not wasted on the exploration.
Same Architecture, Different Domain
This isn't a new pattern — it's the same five stages from Part II applied to proposals. The architecture is reusable; only the content changes. The next chapter applies the same supply chain to research and knowledge synthesis — where the ebook you're reading right now serves as the worked example.
Key Takeaways
- • Route: routing index identifies potentially relevant frameworks (broad recall from 30+ to 15–20)
- • Explore: sub-agents apply each framework to the client in parallel (150K–200K tokens of expansion)
- • Judge: sequential selection (pick 1, then next 1) ensures diversity and quality
- • Compress: 200K exploration → 20K proposal content (10:1 compression)
- • Compound: each proposal improves routing and framework selection for the next one
- • Meta-credibility: the proposal IS the proof — the method demonstrates the value
The Research Supply Chain
This ebook was written using a cognition supply chain. The conversation that produced the source material IS a worked example.
A model with 18-month-stale training data.5 No knowledge of current AI capabilities. Yet within minutes, producing expert-level analysis of AI deployment patterns, governance architecture, and implementation strategy.
How? Not by upgrading the model. By routing the conversation through compressed frameworks, exploring with agentic retrieval, judging against specific goals, and compressing findings back into reusable artefacts.
This chapter traces the supply chain through a research workflow — and in doing so, demonstrates the very thesis of this ebook.
The Research Problem: AI Asking AI About AI
"If you ask AI how to implement AI, it won't even scratch the surface. It'll be wrong, inconsistent. But ask AI about blue whales and you've got heaps of knowledge and background."
This is the "unstable domain" problem from Chapter 1.5 The model's training data is stale for fast-moving fields. Without external knowledge, it hallucinates plausibly — producing confident-sounding advice that's months or years behind current practice.
The naive solution: give it access to the internet. Let it search. Let it read. But without a routing index, the agent searches with generic terms, retrieves generic results, and produces generic synthesis. The internet is the ultimate high-noise, low-signal corpus.
The supply chain solution: don't ask the model to know. Give it a map of what exists, the tools to explore, and the judgment layer to filter for relevance.
The Research Supply Chain — Stage by Stage
Stage 1 — Route: The Framework Map
The routing index — 30+ articles indexed with framework name, importance, "retrieve when" triggers, tags, related concepts, and URLs — provides information scent before the first search.10
The agent reads this map before searching. Instantly knows: Lane Doctrine exists, Simplicity Inversion exists, Context Engineering exists — and when each is relevant.
Compare to cold-start research: "Uh... maybe I should search for AI governance? AI deployment? AI strategy?" The routing index collapses this to: "Lane Doctrine applies here. Let me search for Lane Doctrine content."
Stage 2 — Explore: Iterative Deepening
The agent uses canonical terms from the routing index to search. Each search is informed by previous findings — iterative deepening, not single-shot.21
Example chain: Agent sees "Context Engineering" in the index → searches → finds the "attention budget" concept2 → realises this connects to "signal density" → searches again with refined terms → finds the sub-agent compression pattern → follows THAT thread to Anthropic's multi-agent research.15 Each retrieval sharpens the next query.
Stage 3 — Judge: Dual-Query Filtering
Retrieval query: "context engineering attention budget signal density" (broad, recall-heavy)
Goal spec: "How does context quality vs quantity affect the cognition supply chain thesis?" (rich, intention-heavy)
The judge filters retrieved chunks against the actual research question — not just semantic similarity.7 Chunks that match the keywords but don't serve the thesis get filtered out.
Stage 4 — Compress: Thousands of Pages → 436 Lines
The research file for this ebook: 436 lines distilled from thousands of pages across 30+ articles, Anthropic's engineering blog, academic papers, and API documentation.23
Each finding: quote + source + analysis + connection to the article thesis. The thousands of pages of intermediate reading vanish. The 436 lines of curated evidence remain.
Stage 5 — Compound: A New Framework Emerges
Writing this ebook revealed the "Cognition Supply Chain" as a new named framework. It didn't exist before this research cycle. The routing index gains a new entry. The framework library gains a new framework. The next ebook starts from a higher baseline.22
"The framework you build today solves problems you haven't encountered yet."
The Live Demonstration — This Conversation
The proof is in the source material for this ebook. A recorded conversation where frameworks were described (compressed context), the AI explored the website and followed cross-references between articles, and each exchange deepened understanding. The AI's responses became more specific, more framework-grounded, more nuanced with every turn.
By the end: the AI had synthesised "cognition supply chain" as a concept — something NEITHER participant started with. It emerged from the compounding exploration loop.22 That's the flywheel in action: the conversation itself was a cognition supply chain cycle.
What makes this remarkable: the model's training data predates most of these frameworks. It had never "seen" Lane Doctrine or Simplicity Inversion in training. Yet with the routing index and framework access, it reasoned with these concepts expertly.5 The map made the model a specialist.
Research Maturity Progression
The retrieval maturity ladder (Chapter 2) applied specifically to research workflows:
Level 1: Manual Search
Read articles. Copy quotes into a document. Hope you don't miss anything important. Exhausting. Error-prone. Doesn't scale.
Level 2: RAG Over Knowledge Base
Better recall, but chunk-level — misses cross-document connections and dependency chains.12 Most "we do research with AI" organisations sit here.
Level 3: Agentic RAG
The agent reformulates queries, spots gaps, follows leads.8 Still bounded by chunk similarity, but vastly better than single-shot. The routing index alone gets you here.
Level 4: Full Research Supply Chain
Routing index → agentic exploration with tool access → dual-query judgment → sub-agent compression → compiled research artefact. This is the moat.
The highest-ROI upgrade: going from Level 2 to Level 3 by adding a routing index. A few hours of work that permanently transforms research quality. The full Level 4 supply chain is the long-term goal; the routing index is the starting point.
"AI Asking AI About AI" — Solved
The original problem — "AI asking AI about AI doesn't work" — is now solved. Not by making the model smarter. By giving it:
The insight generalises beyond AI research. Any "unstable domain" — where model training data is stale and the search space is unbounded — benefits from the same supply chain.6 Regulatory compliance. Emerging technology assessment. Competitive analysis. Market strategy. Anywhere the ground shifts faster than models can be retrained.
The existence of this ebook is the meta-proof. Written using the supply chain, about the supply chain. If the architecture didn't work, you'd be reading generic AI advice. Instead, every chapter is grounded in specific frameworks, backed by specific evidence, and structured through specific patterns — all fed through the pipeline.
From Examples to Implementation
Part III has shown the supply chain applied to two domains: proposals (Chapter 7) and research (this chapter). Same architecture, different content. The final chapter provides a practical implementation guide — how to build your first cognition supply chain, stage by stage, starting with the routing index.
Key Takeaways
- • "AI asking AI about AI" fails without a supply chain — the domain is too unstable, the search space too unbounded
- • The routing index bridges the stale-training gap by providing vocabulary and topology the model lacks
- • Iterative deepening (not single-shot search) enables compounding understanding across exploration cycles
- • The counter-factual test proves it: same model, with vs without supply chain = radically different quality
- • Any unstable domain benefits from the same architecture — this isn't AI-specific
Building Your First Supply Chain
Build your routing index this week. Start with 10 framework one-liners and a “retrieve when” trigger for each.
You don't need the full five-stage pipeline on day one. You need the minimum viable supply chain — and the discipline to compress each cycle's learning back into the system.
This chapter maps the implementation path from "we stuff context into system prompts" to "we run a compounding cognition engine." Each stage builds on the last. Start this week.
The Minimum Viable Supply Chain
What you actually need to start: a routing index + agentic search + one judgment layer. That's it. No sub-agent orchestration required on day one. No fancy tooling. No custom infrastructure. A markdown file with framework entries. An AI tool that can search. A habit of separating your search query from your goal.
The 80/20 insight: before the routing index, the agent searches with problem terms and gets generic results. After the routing index, the agent searches with canonical terms and gets targeted results.10 This single change transforms output quality more than upgrading your model.6
Stage 1 — Build Your Routing Index (This Week)
Start with what you know. List your 10–15 most important concepts, frameworks, documents, or knowledge assets. You already know what matters — you just haven't written it down in a machine-readable format.
The minimum entry for each item:
- One-liner: 7-step onboarding workflow covering account setup through first value delivery
- Retrieve when: Questions about customer setup, activation, time-to-value, churn reduction
- Tags: onboarding, customer success, activation, churn, first value
- Location: /docs/processes/customer-onboarding-v3.md
Iteration rule: After each significant AI interaction, ask: "Was there a concept the agent should have known about but didn't?" If yes, add a routing entry. The index grows organically. 5 minutes per new entry.
Stage 2 — Wire Search to the Routing Index
The simplest implementation: before your AI searches, tell it to read the routing index first. A single instruction in your system prompt or briefing:
terms and tags from the index to formulate your search queries."
This alone transforms query quality from problem-terms to corpus-terms. With basic RAG, the routing index becomes a query expansion source — agent reads index, identifies relevant entries, searches with canonical terms. Hits are guaranteed because the terms come FROM the corpus.
With agentic search, the routing index becomes the starting map.13 Agent reads index, selects promising branches, explores those documents first, follows cross-references. The key discipline: the agent reads the MAP before it searches the TERRITORY. Non-negotiable regardless of retrieval strategy.
Stage 3 — Add the Dual-Query Pattern
When to add this: When your search results are "relevant but not quite right" — in the right neighbourhood but not serving your specific goal.
Implementation is a habit change, not an infrastructure change:
Step 1: Write Two Things
A search query (broad, keyword-focused, optimised for recall) and a goal statement (rich, specific, describing exactly what you need and why).
Step 2: Search With the Query
Retrieve candidates using the broad query. Cast a wide net.
Step 3: Judge Against the Goal
Send candidates + goal statement to the model: "Given this goal, which of these results are most relevant? Why?"
Dual-Query Example: Insurance Client
Search Query (Broad)
"error handling AI systems failure modes"
Goal Statement (Specific)
"Our insurance client has a zero-tolerance board. We need frameworks for pre-negotiating error budgets so one mistake doesn't kill the project."
Judge filters for: Three-Tier Error Budgets, the One-Error Death Spiral pattern1 — not generic error handling advice. The insurance context stays in the goal, not the search.
The habit: every time you search, ask yourself — "Am I searching with problem keywords, or am I searching with one query and judging with another?"7 Separate the two.
Stage 4 — Add Sub-Agent Compression
When to add this: When your research or exploration generates too much volume for the main context to handle effectively.2 When you notice the model losing coherence because it's processing too many findings at once.
Most AI coding tools already support sub-agent or sub-task patterns. Give the sub-agent a briefing packet (Chapter 5): Mission, Non-goals, Constraints, Client context, Framework anchors, Deliverable format.18 Let it explore. Receive the compressed output. Read THAT, not the exploration trace.
Stage 5 — Close the Loop (From Day One)
This isn't an advanced stage — it's a discipline that should start from the first routing index. After each significant cycle, ask:
The compound effect: Cycle 1 has 10 routing entries. Cycle 5 has 25. Cycle 20 has 50. Each entry represents institutional learning that makes every future interaction more efficient.22
The Supply Chain Implementation Path
| Stage | What to Build | When to Add | Time |
|---|---|---|---|
| 1. Route | Routing index (10–15 entries) | This week | 1–2 hours initial |
| 2. Search | Wire search to read index first | Same week | 30 minutes |
| 3. Judge | Dual-query pattern | When results are "close but not right" | Habit change |
| 4. Compress | Sub-agent with briefing packet | When research volume exceeds context | 1 hour setup |
| 5. Compound | Post-cycle compression discipline | From day one | 5 min per cycle |
When NOT to Use the Full Supply Chain
Not everything needs this. Honest assessment:
✗ Skip the Supply Chain When
- • Simple queries — "What's the syntax for X?" Just ask the model.
- • Stable domains — Blue-whale-type questions where training data is sufficient.
- • Low-stakes, one-off tasks — If the output doesn't need to be domain-specific or deeply researched.
✓ Use the Supply Chain When
- • Unstable domains — fast-moving fields, stale training data.
- • Dependency-shaped knowledge12 — understanding requires cross-document traversal.
- • Client-specific output — generic advice isn't good enough.
- • Repeated workflows — the compounding loop pays for setup.
- • Institutional knowledge — the answer lives in YOUR knowledge base.
The decision rule: if your outputs are consistently brilliant without architecture, you don't need this. If they're consistently generic despite using frontier models, you need this yesterday.
Start Building Today
You don't need a smarter model. You need a cognition supply chain. The architecture determines output quality. The architecture compounds. Build your routing index this week — 10 entries, 1–2 hours. It will transform every AI interaction from that point forward.
The competitive moat isn't which model you use. It's which supply chain you've built.
Key Takeaways
- • Start with the routing index — 80% of the quality improvement for 1–2 hours of work
- • Wire search to the index before searching the territory (non-negotiable discipline)
- • Add dual-query pattern when results are "close but not right"
- • Add sub-agent compression when research volume exceeds main context capacity
- • Close the loop from day one — 5 minutes of post-cycle compression enables compounding
- • Not everything needs the full supply chain — use it for unstable domains, dependency-shaped knowledge, and repeated workflows
References & Sources
This ebook draws on three categories of sources: primary research from AI engineering teams and academia, industry analysis from search and retrieval platforms, and the author's practitioner frameworks developed through enterprise AI transformation consulting. External sources are cited formally throughout. Author frameworks are presented as interpretive analysis and listed here for readers who want to explore the underlying thinking.
Numbered Citations
[1] Root Causes of Failure for Artificial Intelligence Projects
RAND Corporation research showing AI projects fail at twice the rate of non-AI IT projects, with failure driven primarily by governance and organizational issues rather than technical capability limitations.
https://www.rand.org/pubs/research_reports/RRA2680-1.html
[2] Effective Context Engineering for AI Agents
Anthropic research establishing the attention budget principle: context must be treated as finite resource with diminishing marginal returns. LLMs have working memory capacity limits analogous to human cognition.
https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
[3] On The Computational Complexity of Self-Attention (arXiv:2209.04881)
Academic research demonstrating self-attention scales quadratically with sequence length. Doubling context size quadruples computational load and attention diffusion.
https://arxiv.org/abs/2209.04881
[4] Context Quality Over Quantity
Anthropic research demonstrating that smaller, cleaner context outperforms larger, noisier context due to attention diffusion effects in transformer models.
https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
[5] Worldview Recursive Compression
LeverageAI framework for compiling domain expertise into reusable frameworks that serve as "kernel" for AI interactions. Demonstrates how compressed worldview architecture enables compounding returns.
https://leverageai.com.au/worldview-recursive-compression-how-to-better-encompass-your-worldview-with-ai/
[6] Architecture Over Model Capability
Andrew Ng's research demonstrating GPT-3.5 with agentic architecture outperforms GPT-4 alone, proving that system design matters more than raw model capability.
https://www.insightpartners.com/ideas/andrew-ng-why-agentic-ai-is-the-smart-bet-for-most-enterprises/
[7] Tavily AI-Powered Search for Developers
Two-step search-then-extract flow optimized for LLM ingestion. Demonstrates the dual-query pattern (search for candidates, extract for relevance) as a commercial product validating Stages 2 + 3 of the cognition supply chain.
https://www.tavily.com/blog/tavily-101-ai-powered-search-for-developers
[8] Claude Web Search Tool
Agentic search loop where Claude decides when to search, with the API running searches that can repeat multiple times per request. Industry validation that retrieval is moving from single-shot to exploration-based patterns.
https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool
[9] Building Effective Agents
Anthropic research showing multi-agent orchestration achieves 90.2% success rate versus 14-23% for single-agent approaches on SWE-bench benchmarks. Validates the sub-agent compression architecture (Stages 4 + 5).
https://www.anthropic.com/research/building-effective-agents
[10] Information Foraging: A Theory of How People Navigate on the Web
Peter Pirolli and Stuart Card's foundational theory from Xerox PARC explaining how agents (human and AI) navigate knowledge spaces by following "information scent" — cues that signal whether a path leads to valuable content. Provides the academic grounding for routing indexes as navigation layers that maximize value per unit of exploration effort.
https://www.nngroup.com/articles/information-foraging/
[11] RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Academic research (Sarthi et al., arXiv:2401.18059) demonstrating hierarchical retrieval architecture using tree-structured summaries that retrieve across levels of abstraction. Addresses the global context loss problem inherent in flat chunking approaches. Routing indexes represent a lightweight, human-curated implementation of similar hierarchical navigation principles.
https://arxiv.org/abs/2401.18059
[12] Microsoft RAG Chunking & Parent-Child Retrieval
Microsoft Azure Architecture guidance on semantic chunking limitations in standard RAG systems. Documents how parent-child retrieval patterns address the fundamental tradeoff: small chunks optimize for semantic matching but lose global context, while hierarchical strategies preserve document structure and cross-reference relationships.
https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/rag/rag-chunking-phase
[13] Agentic File Search Pattern
Three-phase exploration architecture (scan, deep dive, backtrack) applying coding agent patterns to document retrieval. Demonstrates how agents can follow cross-document references rather than relying solely on semantic similarity, addressing RAG's fundamental inability to traverse dependency-shaped knowledge.
https://github.com/PromptEngineer48/agentic-file-search
[14] LlamaIndex File-Based Agents
LlamaIndex documentation and benchmarks showing file-explorer agents match RAG quality for complex queries while trading latency for depth. Validates the hybrid architecture: RAG for real-time single-document queries, exploration agents for cross-document dependency analysis in background/async workflows.
https://docs.llamaindex.ai/en/stable/examples/agent/agentic_rag_with_llamaindex/
[15] Building Effective Multi-Agent Research Systems
Anthropic engineering research establishing the foundational principle: "The essence of search is compression: distilling insights from a vast corpus." Documents how subagents facilitate compression by operating in parallel with their own context windows, exploring different aspects of questions simultaneously.
https://www.anthropic.com/engineering/multi-agent-research-system
[16] Parallel Tool Calling Research Time Reduction
Anthropic research demonstrating that parallel tool calling in multi-agent systems cuts research time by up to 90% for complex queries. Validates the token economics argument for sub-agent compression patterns.
https://www.anthropic.com/engineering/multi-agent-research-system
[17] Multi-Agent Systems for Breadth-First Queries
Anthropic internal evaluations showing multi-agent research systems excel especially for breadth-first queries that involve pursuing multiple independent directions simultaneously. Demonstrates when to use scatter-gather patterns.
https://www.anthropic.com/engineering/multi-agent-research-system
[18] Context Engineering: Why Building AI Agents Feels Like Programming on a VIC-20 Again
LeverageAI framework establishing the ephemeral sandbox pattern for sub-agents using the fork-exec pattern from Unix process isolation. Demonstrates how context isolation provides quality control by architecture rather than by hope.
https://leverageai.com.au/context-engineering-why-building-ai-agents-feels-like-programming-on-a-vic-20-again/
[19] DeepMind - AlphaGo & Monte Carlo Tree Search
AlphaGo demonstrated chess-style search algorithms combining neural networks with tree search mechanisms that explore promising moves deeply through iterative deepening and position evaluation methodology. Validates move-ordering heuristics as fundamental to efficient search through large decision spaces.
https://www.deepmind.com/research/highlighted-research/alphago
[20] Multi-Agent Research Systems Performance
Anthropic research showing multi-agent research systems excel for breadth-first queries that pursue multiple independent directions simultaneously. Parallel tool calling cuts research time by up to 90% for complex queries by enabling agents to explore different aspects in parallel.
https://www.anthropic.com/engineering/multi-agent-research-system
[21] Monte Carlo Tree Search (MCTS)
MCTS algorithm designed for problems with extremely large decision spaces like Go (10^170 possible states). Instead of exploring all moves, MCTS incrementally builds a search tree using random simulations to guide decisions, demonstrating iterative deepening principles.
https://www.geeksforgeeks.org/machine-learning/ml-monte-carlo-tree-search-mcts/
[22] LeverageAI: The AI Learning Flywheel
LeverageAI research demonstrating that 1% daily improvement applied to an improved baseline equals 3,778% better performance after one year through compounding returns. Framework validates the kernel flywheel mechanism where each cycle starts from a higher baseline than the last.
https://leverageai.com.au/wp-content/media/The_AI_Learning_Flywheel_ebook.html
[23] The Essence of Search is Compression
Anthropic research establishing the foundational principle that search fundamentally involves compression - distilling insights from a vast corpus. Subagents facilitate this compression by operating in parallel with their own context windows, exploring different aspects of questions simultaneously and enabling 10x compression ratios (200K → 20K tokens) with intelligence preserved.
https://www.anthropic.com/engineering/multi-agent-research-system
[24] Andrew Ng - Agentic Workflows
Andrew Ng's research demonstrating that GPT-3.5 with agentic architecture outperforms GPT-4 alone, proving that system design and architectural patterns matter more than raw model capability. Validates the thesis that architecture compounds while model upgrades provide only linear improvements.
https://www.insightpartners.com/ideas/andrew-ng-why-agentic-ai-is-the-smart-bet-for-most-enterprises/
[25] The Use of MMR, Diversity-Based Reranking for Reordering Documents
Carbonell & Goldstein's foundational 1998 research introducing Maximum Marginal Relevance (MMR) algorithm. Computes score balancing relevance and diversity, preventing redundancy while maintaining query relevance in search results. Formula: MMR = (1 − λ) × relevance_score - λ × max(similarity_with_selected_docs). Widely adopted for search result diversification, summarization, and recommendation systems.
https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf
[26] MIT Sloan Review: The End of Scale
MIT Sloan Management Review research establishing that AI enables economies of specificity replacing economies of scale. When cost of key inputs (cognition, information) drops dramatically, economic structure reorganizes around new abundance, making customization cheaper than standardization. Structural shift as fundamental as agrarian to industrial transition.
https://sloanreview.mit.edu/article/the-end-of-scale/
[27] MIT Sloan Review: Economies of Specificity Applied to Consulting
MIT Sloan analysis demonstrating how industrial-era economics favored standardization while AI-era economics enables perfect customization at lower cost than generic materials. When cognition is cheap and parallel, the economics flip from one-niche-one-template to individualized compilation.
https://sloanreview.mit.edu/article/the-end-of-scale/
[28] Consulting Success: AI Business Models and Best Practices
Consulting Success industry analysis showing AI reduces proposal development time by 70% while improving win rates. Traditional manual framework selection and proposal writing requires 2-3 consultant-days versus hours for AI-powered cognition supply chain approach, demonstrating dramatic efficiency gains in professional services.
https://www.consultingsuccess.com/consulting-business-models
[29] Qatalyst: AI-Powered Proposal Generator Case Study
Qatalyst case study demonstrating AI proposal system used to develop over 50 proposals across range of sectors. System significantly reduced revisions and demonstrated scalable infrastructure where marginal cost drops while quality increases after initial setup investment in routing indexes and briefing protocols.
https://qatalyst.ca/case-studies/study/proposal-generator
[30] Boutique Consulting Club: Win Rate Analysis
Boutique Consulting Club industry analysis showing custom, well-researched proposals achieve 60-90% win rate versus 20-30% for generic materials. Demonstrates that meta-credibility from bespoke proposal generation provides measurable competitive advantage in professional services market.
https://www.boutiqueconsultingclub.com/blog/win-rate
Primary Research
Effective Context Engineering for AI Agents
Attention budget principle: context as finite resource with diminishing marginal returns. LLMs have working memory capacity limits analogous to human cognition.
https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
On The Computational Complexity of Self-Attention (arXiv:2209.04881)
Self-attention scales quadratically with sequence length. Doubling context size quadruples computational load and attention diffusion.
https://arxiv.org/abs/2209.04881
Building Effective Agents
Multi-agent orchestration research showing 90.2% success rate vs 14–23% for single-agent approaches on SWE-bench benchmarks.
https://www.anthropic.com/research/building-effective-agents
Multi-Agent Research Systems
"The essence of search is compression: distilling insights from a vast corpus." Parallel tool calling cuts research time by up to 90% for complex queries. Sub-agents facilitate compression by operating with their own context windows.
https://www.anthropic.com/engineering/multi-agent-research-system
Information Foraging Theory
Information scent concept: agents follow cues that maximise value per unit of exploration effort. Foundation for routing index as navigation layer.
https://www.nngroup.com/articles/information-foraging/
Information Foraging: A Theory of How People Navigate on the Web
Peter Pirolli and Stuart Card's foundational theory explaining how agents follow "information scent" to maximize value per unit of exploration effort. Academic grounding for routing indexes as navigation layers.
https://www.nngroup.com/articles/information-foraging/
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval (arXiv:2401.18059)
Hierarchical retrieval with tree-structured summaries that retrieve across levels of abstraction, addressing chunking's global context loss.
https://arxiv.org/abs/2401.18059
Industry Analysis & Platforms
Tavily: AI-Powered Search for Developers
Two-step search-then-extract flow optimised for LLM ingestion. Demonstrates the dual-query pattern (search for candidates, extract for relevance) as a commercial product.
https://www.tavily.com/blog/tavily-101-ai-powered-search-for-developers
Claude Developer Platform: Web Search Tool
Agentic search loop where Claude decides when to search, repeating multiple times per request. Industry validation that retrieval is moving from single-shot to exploration.
https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool
LeverageAI / Scott Farrell
Practitioner frameworks and interpretive analysis developed through enterprise AI transformation consulting. These frameworks are presented as the author's voice throughout the ebook and listed here for readers who want to explore the underlying articles.
Worldview Recursive Compression
The kernel flywheel: compile domain expertise into frameworks that serve as reusable source code for AI interactions. The compounding loop described in Chapter 6.
https://leverageai.com.au/worldview-recursive-compression-how-to-better-encompass-your-worldview-with-ai/
Context Engineering: Why Building AI Agents Feels Like Programming on a VIC-20 Again
Treat LLM context like OS memory management. Tiered memory hierarchy, signal density over volume, sub-agents as ephemeral sandboxes. Foundation for the compression architecture in Chapter 5.
https://leverageai.com.au/context-engineering-why-building-ai-agents-feels-like-programming-on-a-vic-20-again/
Discovery Accelerators: The Path to AGI Through Visible Reasoning Systems
Chess-style search through idea space with move-ordering heuristics. Systematic exploration at ~100 nodes/minute. The exploration pattern referenced in Chapter 6.
https://leverageai.com.au/discovery-accelerators-the-path-to-agi-through-visible-reasoning-systems/
The Fast-Slow Split: Breaking the Real-Time AI Constraint
Separate the talker from the thinker. Exploration is the slow lane doing heavy cognition; interactive layers are the fast lane. Referenced in Chapter 4's hybrid architecture.
https://leverageai.com.au/the-fast-slow-split-breaking-the-real-time-ai-constraint/
Stop Picking a Niche: Send Bespoke Proposals Instead
Marketplace of One — AI inverts customisation economics so bespoke proposals are cheaper than generic sales materials. Applied in Chapter 7's proposal supply chain.
https://leverageai.com.au/stop-picking-a-niche-send-bespoke-proposals-instead/
Knowledge is a Tool: RAG for Agentic Systems
RAG fundamentals that the cognition supply chain builds upon. Single-document agentic RAG patterns.
https://leverageai.com.au/knowledge-is-a-tool-rag-for-agentic-systems/
Micro-Agents, Macro-Impact
Router/Supervisor/Worker architecture for composable AI agents. The micro-agent pattern underpins the sub-agent compression architecture.
https://leverageai.com.au/micro-agents-macro-impact-why-small-composable-ai-agents-beat-one-mega-brain/
The Three Ingredients Behind Unreasonably Good AI Results
Agency + Tools + Orchestration = compounding returns. The three requirements for the cognition supply chain to function.
https://leverageai.com.au/the-three-ingredients-behind-unreasonably-good-ai-results/
Breaking the 1-Hour Barrier: AI Agents That Build Understanding Over 10+ Hours
Stateless workers + stateful orchestration for extended AI sessions. The long-running agent patterns that enable deep supply chain cycles.
https://leverageai.com.au/breaking-the-1-hour-barrier-ai-agents-that-build-understanding-over-10-hours/
Progressive Resolution: The Diffusion Architecture for Complex Work
Coarse-to-fine resolution layers. The progressive refinement pattern that mirrors routing index → exploration → compression.
https://leverageai.com.au/progressive-resolution-the-diffusion-architecture-for-complex-work/
Note on Research Methodology
Research for this ebook was conducted using the cognition supply chain architecture it describes. A routing index of 30+ framework articles provided information scent for agentic RAG search. External sources (Anthropic engineering blog, academic papers, API documentation) were retrieved through dual-query patterns and judged against the ebook's thesis. Findings were compressed into a structured research artefact before chapter writing began.
All statistics and quotes are attributed to their original sources. Author frameworks are presented as interpretive analysis informed by enterprise AI consulting practice, not as independent research findings.
Compiled February 2026. Some links may require subscription access or may have been updated since compilation.