The Blur Is Load-Bearing: A Resolution Ladder for Reading, Not Writing

SF Scott Farrell July 4, 2026 scott@leverageai.com.au LinkedIn

AI Architecture · LeverageAI · Progressive Resolution vol. 2

The Blur Is Load-Bearing: A Resolution Ladder for Reading, Not Writing

Progressive Resolution built the write side. Invert it for reading, and one property does all the work: resolution correlates inversely with staleness risk — so the layers cheap enough to cache are exactly the layers that never rot.

Scott Farrell — LeverageAI · A field guide for AI architects & platform engineers · ~11 min read

TL;DR

  • An agent reading a huge, changing corpus has two ways to fail: a stale index (embeddings from last week, confidently wrong today) or a blown context budget (trying to hold the live source). Both come from caching at the wrong resolution.
  • Build a five-layer read ladder. L0 map → L1 wiki pages (claims + edges, deliberately blurry) → L2 skeleton regenerated on demand → L3 grep → L4 full source. Cost rises as you descend; the agent’s judgment decides how far each question deserves to go.
  • The elegant property: what rots fastest is never cached. Signatures, line numbers and bodies are regenerated fresh or read live; the cache holds only relationships, which stay directionally true as detail churns. A stale number is a wrong answer; a stale relationship is still directionally true — and that falls out of the architecture, not a policy.

You can already point ChatGPT or Claude at your Gmail and ask it to find the invoice from June about electricity. It works. Now ask it the other kind of question — the one where you can’t remember the person’s name, but you had a conversation with them a couple of years ago about something adjacent to what you’re working on now. It falls over. Same inbox, same model, wildly different outcome. The first is a lookup; the second is a relationship query. And the reason every connector nails the first and whiffs the second is structural: they all ship the bottom of a ladder and nothing above it.

The MCP connectors, the vendor copilots, the “search your documents” features — they concede this by their shape. They give an agent deterministic access to fetch and grep the live source: the bottom two rungs. What they can’t give it is a map. The invoice-in-June query works because a keyword hits a document. The vague one fails because relationships between things were never compiled, and you can’t retrieve what was never built.

This article is about the rest of the ladder — the rungs above the search box. It’s the second half of an argument I started on the generation side, so let me name that first and then turn it inside out.

One paragraph on the write side, then we invert it

Progressive Resolution is an architecture for writing complex things. Instead of drafting prose left-to-right like a tape, you work the way an image-diffusion model does: start at the lowest resolution (intent, a one-page silhouette), stabilise it, and only then advance to chapter cards, section skeletons, and finally polished sentences — with a gate at each layer and the discipline to back up to the right layer when something breaks, rather than patching expensive prose. Coarse-to-fine, stabilise before you sharpen. That’s the whole write-side ebook in a sentence, and I’m not going to re-derive it.

Now flip it. On the write side you climb the ladder, coarse to fine, committing detail as structure stabilises. On the read side you do the opposite: you hold the finished world at its lowest useful resolution permanently, and you descend — sharpening only the one spot a given question actually needs, then stopping. Writing spends resolution to build. Reading spends it to answer, as sparingly as possible. Same ladder, opposite direction, opposite economics.

The reason this matters is that a stuffed system prompt — the industry’s default answer to “how does the agent know things” — is knowledge frozen at a fixed resolution, chosen at write time, for every future question at once. Whoever wrote it had to guess, in advance, how much detail each future task would need. So it’s simultaneously too much (the agent burns attention on all of it, every turn) and too little (the one detail this task needs was below the cutoff). The read ladder dissolves both ends: resolution is chosen at read time, per question. The prompt is a photograph of the territory. The ladder is the territory with a map on top, and legs to climb down.

The five layers, with their two prices

Here is the ladder. Read every row twice — once down the cost column, once down the staleness column — because the entire argument lives in the fact that the two columns move together.

Layer What it holds Cost to read Staleness risk Cached?
L0 — Map / index One screen of pure information scent: what exists, organised, one line each ~Free Near-zero — relationships Yes — always resident
L1 — Wiki pages Claims and typed edges per entity — deliberately blurry, no digits Low Low — directional Yes
L2 — Skeleton Structure without bodies: signatures, headings, shape Medium None — made fresh No — regenerated on demand
L3 — Grep Pinpoint: the exact line, string, or match Medium–high None — live No — live read
L4 — Full source Ground truth: the whole function body, chapter, session segment Highest None — live No — live read

Two of those rows aren’t hypothetical — the best coding agents already ship them, they just haven’t been named as one ladder. L2 is Aider’s repo map. It parses your whole codebase with tree-sitter into an abstract syntax tree, ranks the symbols with a PageRank over the reference graph, and emits signatures and structure, not the function bodies — scope-aware rendering, fitted to an adaptive token budget, and rebuilt on demand rather than persisted.12 A skeleton, materialised fresh, thrown away after. L3 is Claude Code’s deliberate refusal to build an index at all — it greps the live source agentically instead, and pays for it in roughly 40% more tokens, on the explicit bet that reading ground truth beats trusting a cache.3 Cursor sits in between with the piece everyone forgets: a Merkle tree of file hashes, so it can detect which files changed and re-embed only those — an entire subsystem that exists purely to fight staleness in a cache of high-resolution chunks.4

Hold that thought about Cursor, because it’s the tell. When you cache the fine-grained layer, you have to build machinery to keep it fresh. When you cache only the coarse layer, you don’t — and the ladder explains why.

The blur is load-bearing — twice

The word people trip over is “blurry.” It sounds like a defect. It’s the feature, and it earns its keep in two separate ways.

First: compression is the only way it fits. A hundred project folders, seventy articles, years of email — there is no context window that holds all of that at full resolution, and there never will be, because the corpus grows faster than the window. The map fits because it threw away the pixels. L0 is one screen for a hundred projects precisely because it refused to be anything more. You don’t compress the world as a sad compromise against a bigger budget; you compress it because a compressed world is the only kind an agent can hold all of at once and actually reason across.

Second, and less obvious: cross-referencing is possible because you refused resolution. At L1, a page about a Twilio voice project and a page about an OpenAI realtime experiment can sit next to each other and share an edge — both implement barge-in handling — because both have been abstracted to the level where that relationship is even visible. Drop back to full source and the relationship disappears into two thousand lines of unrelated-looking code. The blur is what lets things rhyme. Relationships only exist at the resolution where detail has been boiled off; keep the detail and you keep the two projects looking like strangers.

Compression isn’t what you do to the map to make it fit. Compression is what makes it a map.

The property that makes the whole thing hold: staleness ∝ 1 / resolution

Now the payoff, and it’s the reason to build the ladder this way rather than a dozen others. Look back at the table. Cost rises as you descend — obviously; reading a whole function costs more than reading its signature. But staleness rises as you descend too, and that is not obvious, and it is the entire game.

Think about what actually rots in a corpus that’s being worked on. A function gets renamed — the signature is wrong instantly. A line gets inserted — every line number below it is wrong that second. A body gets refactored — the old text is stale before you finish reading it. The high-resolution facts are the volatile ones; they have a half-life measured in commits. But climb up: the relationship — “this project leans on Twilio for telephony,” “the ingestion agent feeds the janitor” — survives dozens of refactors. It degrades gracefully over months, not instantly on the next edit.

So the cost ordering and the staleness ordering are the same ordering. Which means the instruction “cache the cheap layers” and the instruction “cache the durable layers” are the same instruction. You never have to sit down and decide what’s safe to cache for freshness — the cost gradient already sorted it for you. Cache L0 and L1. Regenerate L2 fresh every time (it’s deterministic and cheap — tree-sitter, not a language model). Read L3 and L4 live. The perishable layers are never stored, so they can never be stale; the stored layers are the ones that age like relationships, not like line numbers.

The line the architecture writes for you

A stale number is a wrong answer. A stale relationship is still directionally true. You don’t have to enforce that with a policy — it falls out of caching only the layers that age well.

This is also the correction to the word “cache.” A cache implies the source can cheaply re-derive the identical thing, so freshness is a race. What L0/L1 hold is a compile — comprehension paid once per source, structure added on top (the edges are the added value), then reused across every future question. In the memory hierarchy: model weights are baked and universal, the context window is per-turn, the KV cache is per-session, and this compiled map is the missing tier — durable, personal, semantic long-term memory that happens to be inspectable. It doesn’t race the source. It ages at the speed of meaning, which is slow.

One query, descending exactly as far as it warrants

Enough architecture. Here’s the ladder doing its job on a real question I actually needed answered: “Where have I implemented voice AI, and did any of it handle barge-in — the caller interrupting the agent mid-sentence?” Watch where it stops.

# Q: where is voice AI implemented, and did any of it handle barge-in?

L0  read the map        // ~free, already resident
      → scent hits: voice_ai/, voice_ai2/, twilio-agent/, realtime-exp/
      → 4 candidates out of ~100 folders. no other layer touched yet.

L1  walk those 4 pages    // low cost, cached, one parallel pull
      → claims: "twilio-agent uses Twilio Media Streams for telephony"
      → edge: twilio-agent --[considered]--> barge-in, marked DEFERRED
      → edge: realtime-exp --[relies-on]--> OpenAI Realtime (native interruption)
      ✓ the RELATIONSHIP question is fully answered here. stop climbing down for it.

# but "DEFERRED" is a claim about intent. did the code actually ship it?
# that's a DIGITS question — the map can't answer it, and shouldn't try.

L2  regenerate skeleton(twilio-agent)   // medium, made fresh, never cached
      → tree-sitter → signatures only:
        on_media(chunk), on_stop(), _play(text)  — no interrupt handler present
      → skeleton confirms: no barge-in in the shipped surface.

L3  grep -rn "interrupt|barge|cancel" twilio-agent/  // live read
      → 1 hit, in a code comment: "TODO: barge-in — needs Media Stream clear"
      ✓ enough. answer assembled. L4 (full file) never opened.

Count the restraint. The relationship half of the question — where is voice AI, what did each project decide — never left L1. It didn’t need a skeleton, a grep, or a source read, because relationships live in the cached layer and the cached layer had them. Only the digits half — did the interrupt handler actually ship — forced a descent, and it stopped at a one-line grep hit. L4, the full source, stayed shut. The agent paid the high price for exactly one clause of the question and the near-zero price for the rest. That’s the ladder’s discipline: descend for the digits, cruise the relationships, and stop the moment the question is answered. A summary cascade can’t do this — by the time it’s compressed to the top, provenance is gone and you can’t interrogate anything below. A ladder keeps the legs attached.

The token math, versus reading flat

The reason this isn’t just elegant but cheaper shows up when you compare it to flat retrieval — embed everything, pull the top-k similar chunks, stuff them in context. Three findings from the retrieval literature line up behind the ladder.

One: structure-aware beats blind chunking by a mile. Splitting code on its AST boundaries hits 70.1% Recall@5 against 42.4% for fixed-size chunks5 — the shape of the thing carries the meaning, which is exactly why L2 emits a skeleton rather than a slab. Two: more retrieval is not better retrieval. Quality peaks at five-to-ten chunks and degrades once chunks run past ~2,500 tokens6 — so flat retrieval’s instinct to pour in more context is actively counter-productive, while the ladder’s instinct to pull one map plus one skeleton is on the right side of that curve by construction. Three: the walk itself is native. Following a named edge on a page written to be read is the single most in-distribution act a language model can perform — it’s just reading — whereas formulating queries against an embedding space is not, which is why RAG loops surf redundancy (top-k chunks are similar to each other) and a wiki walk never asks for the same thing twice. The map holds the bookkeeping the model is worst at, so a cheap model on low thinking stays accurate.

Net: flat retrieval pays its full price on every question, because it can’t tell a relationship query from a digits query — it embeds and re-reads the world each time. The ladder pays the flat-retrieval-sized price (L3/L4) only on the clauses that genuinely need ground truth, and answers everything else from a cache that was compiled once and rides in the header for free. Claude Code’s own ~40% token premium for going index-free3 is the cost of living at L3/L4 all the time; the ladder is how you buy that fidelity only when the question earns it.

Push the header; let the agent pull down the ladder

So the design rule is short. Push the persistent header; let the agent pull down the ladder. L0 and L1 — the map and the blurry claim-pages — ride permanently in the context header, always resident, always cheap. Everything below is pull: the agent, exercising judgment, decides per question whether this one deserves a skeleton, a grep, or a full read, and it stops as soon as the answer is in hand.

Which quietly demotes a whole category of tooling. The tree-sitter parser, the ripgrep call, the DOM-flattener that turns a rendered page into its structural skeleton — in the old mental model these are pipeline stages, deterministic pre-processors you run up front to build an index. In the ladder they’re something better and cheaper: tools that materialise a resolution layer on demand. Code makes the layers cheap to produce; model judgment decides when to descend. You stop running the whole pipeline over the whole corpus on a schedule and hoping the index is fresh, and you start regenerating exactly the one skeleton the one live question needs — which, because it’s regenerated, is never stale.

And this is why the top of the ladder is the defensible part. Everyone can ship the bottom rungs — grep and fetch are commodities, and the vendor connectors prove it by shipping nothing else. The map and the compiled claim-layer take comprehension paid once per source across your whole world — code and docs and transcripts and mail under one lens — and no platform confined to its own silo can assemble that for you. Everyone is shipping the rungs. The value is at the top, and the top is unshippable by anyone but you.

The write-side ebook taught you to climb the ladder carefully, stabilising each layer before you sharpen the next. The read side is the same ladder with the blur promoted from a stage you pass through to a place you live: hold the world at its lowest useful resolution, keep the pixels in the source where they belong, and send the agent down for them one question at a time. The blur was never the thing to fix. The blur is what’s holding the whole structure up.

Build the ladder, not another index treadmill

If your agents are stuck re-indexing a corpus that changes faster than you can rebuild it — or paying frontier prices to compensate for a map you never compiled — that’s a resolution problem, not a model problem. At LeverageAI we design read ladders that cache only what ages well, so cheap models answer relationship questions accurately and descend for ground truth only when it’s earned. Talk to us about applying this to your corpus.

References

  1. [1]Aider. “Repository map.” — Aider builds a map of the repository using tree-sitter to extract function and class signatures and structure (not bodies), ranks symbols by importance, and fits the result to an adaptive token budget, computed before each request. aider.chat/docs/repomap.html
  2. [2]Aider. “Building a better repository map with tree-sitter” (2023-10-22). — Tree-sitter parses source into an AST; a graph of symbol references is ranked with PageRank so the most important definitions surface first, rendered as signatures and scope, not implementations. aider.chat/2023/10/22/repomap.html
  3. [3]Claude Code / analyses of its retrieval approach. — Claude Code deliberately uses no RAG index, exploring the live codebase agentically (grep/read) instead, at a cost of roughly 40% more tokens than an index-based approach — trading tokens for always-current ground truth.
  4. [4]Cursor. “Codebase indexing / security.” — Cursor computes a Merkle tree of file hashes locally and syncs it with the server; because only changed files and their parent hashes differ, it re-embeds just the modified chunks, and stores embeddings plus obfuscated paths (source code stays local). cursor.com/security
  5. [5]AST-aware code chunking benchmark. — Splitting code on abstract-syntax-tree boundaries achieves 70.1% Recall@5 versus 42.4% for fixed-size chunking, indicating structure-preserving chunks retrieve substantially better than blind splits.
  6. [6]Retrieval chunk-count / chunk-size research. — Retrieval quality peaks at roughly five-to-ten chunks and degrades once individual chunks exceed ~2,500 tokens; adding more or larger chunks past that point reduces answer quality rather than improving it.
  7. [7]LeverageAI — related canon (context, not statistics): Progressive Resolution (the write-side ladder this inverts), The Index Is the Data (the compiled map substrate), Context Engineering (the persistent-header / working-set principle), and RAG Was Built for Chatbots (the substrate debate this article stays out of). leverageai.com.au

Discover more from Leverage AI for your business

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2026 Leverage AI, Scott Farrell. All rights reserved. This content is made available on a limited, revocable, read-only basis only. No licence or right is granted to copy, reproduce, republish, scrape, store, adapt, summarise, index, embed, or use this content to create derivative works, work product, deliverables, methodologies, training materials, prompts, templates, software, services, research, or commercial outputs, whether by humans or machines, without prior written permission. This restriction includes internal business use, client work, consulting, advisory, implementation, and any use in or for artificial intelligence, machine learning, data extraction, retrieval, evaluation, fine-tuning, or knowledge-base construction.