AI Strategy · Memory Substrates
Don’t Migrate Your RAG to a Wiki
I’ve spent a year arguing that agents want a wiki, not a retrieval index — and I stand by every word of it. So this is the honest boundary on my own case: some corpora should stay a RAG, and converting them is a mistake. The rule that tells you which is which is query shape × reuse frequency × loss tolerance, and the upgrade path for a working RAG is stratification, not migration.
By Scott Farrell · LeverageAI
TL;DR
- Substrate is a decision, not an allegiance. Run the rule per corpus: query shape × reuse frequency × loss tolerance. It routinely sends two corpora owned by the same person to two different substrates.
- RAG’s home turf is the exhaustive prior-art sweep. “Find all the ways people solved X” is recall-oriented, corpus-shaped and single-hop. You want the seventeen variant solutions, not one compacted claim — and synthesis is lossy in exactly that dimension.
- Wiki ingestion is expensive synthetic augmentation. It’s a pile of LLM calls that compiles understanding once. That cost only amortizes under reuse: daily email triage pays it back thousands of times; an occasional prior-art lookup never does.
- Stratify, don’t migrate. Keep the RAG as the raw layer and let a thin wiki atlas grow on top — only recurring themes earn a page, and each page routes back down into the RAG for specifics. The index is the data; the RAG is the digits.
- For live, churning sources, skip the maintained vector index. ripgrep and BM25 at query time beat an index you have to keep re-embedding. If anything gets embeddings, embed the compiled wiki pages once you pass a few hundred — not the raw material.
The question I kept getting wrong on purpose
Over the last year I’ve published a fairly loud argument: that retrieval-augmented generation was built for chatbots, that agents need a compiled worldview instead, that the index is the data and a self-cleaning wiki-graph out-thinks a vector store on the workloads that matter. I believe all of it. My own email triage went from hopeless to genius the day I gave the agent a wiki instead of a pile of embeddings, and I’ve watched the same substrate turn a utility-class model into something that reads my inbox like it knows me.
So the reasonable next move — the one several people have asked me about, and the one I caught myself reaching for — is to convert everything. I have other corpora sitting in RAG: a scanner that reads everything on Reddit about the agent framework I run my workloads on and files it away, a dev folder with a hundred project directories, years of research exhaust. If the wiki is so much better, migrate them all. Wiki-fy the lot.
That instinct is wrong, and it’s worth being precise about why, because the reason it’s wrong is the same reason the wiki is right where it’s right. A claim with an honest boundary is stronger than a slogan. Telling you when not to use my favourite tool is how you know the rest of the case was engineering and not enthusiasm. This piece is the boundary. It doesn’t retract the canon — it bounds it, and bounding is where confidence lives.
One fence up front, so we don’t relitigate settled ground. Why wikis beat RAG for open-ended agent reasoning is a separate argument I’ve already made in RAG Was Built for Chatbots, and why a model can walk a wiki fluently but flails driving a vector search is covered in Why LLMs Can Walk a Wiki But Can’t Drive a RAG. This article assumes all that and asks the opposite question: given the wiki is usually the answer, which corpora are the exception, and how do you tell before you spend the money?
The rule: query shape × reuse × loss tolerance
I’ve written a lot about the Lane Doctrine — deploy each part of a system where the physics is on your side, instead of forcing one substrate to do every job. Memory substrates have their own lanes, and three properties of a corpus decide which lane it belongs in.
Query shape. What does a typical question actually ask the corpus to do? There are two archetypes and they pull in opposite directions. One is recall: “surface every instance of X” — broad, single-hop, exhaustive, where completeness is the whole point and you’d rather over-return than miss one. The other is synthesis: “what should I conclude, given everything the corpus knows about X” — multi-hop, relational, where the answer is a compiled judgement that spans many sources. RAG is built for the first shape; a wiki is built for the second.
Reuse frequency. How often is the same compiled understanding asked for? This is the economic axis and almost nobody weighs it, because RAG conditioned everyone to think retrieval is free. It isn’t, once you switch substrates — and the difference in build cost is enormous.
Loss tolerance. If the substrate compresses the source into something smaller and self-describing, does that compression destroy the thing you needed? For some queries a good summary is better than the raw material. For others the specific, uncompacted variants are the answer, and any synthesis that smooths them into a claim has thrown away the signal.
The substrate rule
Recall-shaped queries, low reuse, low loss tolerance → keep it a RAG. Synthesis-shaped queries, high reuse, high loss tolerance → compile it into a wiki. Most real corpora sit somewhere in between — which is what stratification is for.
Here is the pairing that makes the rule click, and it’s two corpora I own personally, sent to opposite substrates by the same rule.
Stays a RAG
The prior-art scanner
- Query: “find all the ways people solved X” — recall, single-hop, exhaustive.
- Reuse: occasional. You sweep it when you hit a new problem, not daily.
- Loss: intolerable. You want the seventeen variants, not the average of them.
- Verdict: synthesis would delete the product. Leave it raw.
Becomes a wiki
The email worldview
- Query: “is this worth interrupting me, given what I know and care about” — synthesis, relational.
- Reuse: constant. Triage runs against the same worldview every day.
- Loss: welcome. You want the compiled judgement, not the raw thread.
- Verdict: compilation pays for itself thousands of times. Compile it.
Same person, same tooling, same models. The corpora go to different substrates because the jobs are different, and once you see that, “should I migrate my RAG to a wiki” stops being an ideological question and becomes an arithmetic one.
RAG’s home turf: the prior-art sweep
Let me make the case for RAG properly, because I don’t want this read as a grudging concession. There is a query shape RAG is genuinely, structurally the best tool for, and it’s the exhaustive prior-art sweep.
My scanner reads everything being said about the agent framework I build on and drops it into a vector store. When I hit a problem — say, how to persist state across runs, or how to work around a provenance gap — the question I bring is: show me every approach anyone has tried. Not the best one. Not the consensus. Every one. I want the seventeen variant solutions laid out so I can see the whole design space, spot the clever outlier, and steal the idea nobody upvoted. That is recall, and it is exactly the shape top-k similarity retrieval was built to serve: cast a wide net over a corpus and haul back everything near the query.
Now watch what a wiki would do to that. A wiki’s whole value is that an agent reads each source and synthesizes it into the worldview — it compacts, cross-references, resolves contradictions, and files a claim. That is precisely the wrong operation here. “The common approaches to state persistence are A and B” is a synthesis that has deleted approaches C through Q — the weird ones, the abandoned ones, the one that’s wrong in general but perfect for my case. The compaction that makes triage brilliant is the compaction that destroys a prior-art sweep. It’s the same property producing opposite verdicts, and loss tolerance is the dial that decides which one you get.
Synthesis is lossy in exactly the dimension a prior-art sweep needs. You went looking for the variants; the wiki’s job is to erase them.
There’s a retrieval-quality subtlety worth being honest about, because RAG isn’t magic at recall either. Vector search over raw chunks has real failure modes: approximate-nearest-neighbour indexes can end their search before exploring the parts of the graph that held your best match, and the effect gets worse at low k;1 naive fixed-size chunking retrieves far worse than structure-aware chunking (roughly 70% versus 42% Recall@5 in code-retrieval benchmarks);2 and hybrid keyword-plus-vector search beats either method alone by 15–30%.2 The point isn’t that RAG is flawless on its turf — it’s that its failures on recall are tuning problems (better chunking, hybrid search, higher k) that keep the variants in play, whereas a wiki’s “failure” on recall is by design: it was built to throw the variants away. You tune a RAG toward completeness. You can’t tune a wiki back into it.
Why the wiki is expensive, and why that decides it
The axis that actually settles most of these calls is cost, and it’s the one RAG made everyone forget. Dropping a document into a vector store is cheap: chunk it, embed it, write the vectors. Ingesting a document into a wiki is not. An agent reads the source, walks the existing wiki to see how it connects, drafts claims and edges, and a more expensive model reviews and commits the mutation. It’s a chain of LLM calls per source — genuine synthetic augmentation, not a summary but a compiled, cross-linked representation that’s smaller than the source and self-describing. That compilation is the wiki’s superpower. It’s also its bill.
You don’t have to take my word for the order of magnitude. When Cognition built DeepWiki — auto-generated wikis for public code repositories — indexing the first 50,000 repositories reportedly cost around $300,000 in compute, and they regenerate on a schedule rather than on every commit because re-compilation is too expensive to run continuously.3 That’s compilation economics in one data point: the understanding is valuable because it was expensive to produce, and you only pay it back by reading the result many times.
Which is the whole game. Compiling a corpus into a wiki is a capital expense — paid once, up front, per source. It only makes sense when the compiled understanding is reused enough to amortize that cost. This is the same capex-versus-opex logic I laid out in Context Arbitrage, pointed at the substrate decision:
- Email triage runs against the compiled worldview every single day, on every incoming message. The ingestion cost is divided across thousands of reads. The wiki is the best money in the stack.
- A prior-art sweep happens when I hit a new problem — occasionally, unpredictably, and rarely twice the same way. There’s no reuse to amortize against. Every dollar spent compiling that corpus into a wiki is a dollar spent building a beautiful synthesis I’ll consult once and that deletes the variants I needed anyway.
So the RAG stays a RAG for two independent reasons that happen to agree: its query shape wants recall the wiki can’t provide, and its reuse frequency can’t amortize the compilation cost the wiki demands. When both the capability axis and the economic axis point the same way, the decision isn’t close.
Stratification: the upgrade path that isn’t migration
“Leave it a RAG” sounds like standing still, and I don’t want to leave you there, because there is an upgrade path — it’s just not the one the word “migrate” implies. Migration says: lift the corpus out of one substrate and set it down in the other. Stratification says: keep the corpus exactly where it is, and grow a thin new layer on top of it.
Keep the RAG as the raw layer — the complete, un-synthesized, recall-optimized ground truth. Then let a thin wiki atlas accrete above it. The discipline that keeps it thin is a single rule: only recurring themes earn a page. The first time you sweep the corpus for “state persistence approaches,” you just run the RAG. The third or fourth time you find yourself asking a variant of the same question, that theme has proven it’s worth compiling — so it earns an atlas page. And critically, the page doesn’t replace the RAG results. It routes down into them.
The stratified stack — index on top, digits underneath
THIN WIKI ATLAS (compiled, cheap to read, only recurring themes)
┌───────────────────────────────────────────────┐
│ page: "state persistence approaches" │
│ · the shape of the design space (synthesis) │
│ · edges to neighbouring themes │
│ · ▼ pointers, NOT a replacement ▼ │
└───────────────────┬───────────────────────────┘
│ routes down for specifics
▼
RAG RAW LAYER (complete, un-synthesized, recall-optimized)
┌───────────────────────────────────────────────┐
│ every thread, every variant, every outlier │
│ chunk C … chunk Q — the seventeen solutions │
└───────────────────────────────────────────────┘
The index is the data; the RAG is the digits.
You get the wiki’s gift — a one-screen map of a sprawl, an agent that can orient before it dives — without paying to compile the whole corpus and without losing a single variant, because the variants still live intact in the layer below. The atlas is an index over the RAG, not a synthesis that supersedes it. This is the literal meaning of the line from The Index Is the Data, applied to a hybrid: the compiled index is the layer you read first and reason over, and the RAG holds the full-resolution digits you drill to when the map says “here, but go deeper.”
If you want the atlas page as a concrete schema, here’s the shape I use — deliberately small, so the discipline stays honest:
Solutions-atlas page schema
theme: state-persistence-approaches
one_liner: how people keep agent state across runs
the_shape: 2–4 sentences on the design space, NOT a winner
dimensions: the axes the variants differ on (durability,
cost, blast-radius) — so a reader can navigate
rag_query: the exact query that pulls the full variant set
edges: neighbouring atlas themes
earned_on: date the theme recurred enough to warrant a page
Notice what the schema refuses to contain: the answer. It stores the shape of the design space and the query that fetches the specifics — never a compacted “the best approach is X.” That refusal is what keeps stratification honest. The moment an atlas page starts absorbing the variants instead of pointing at them, you’ve quietly migrated after all, and you’ve reintroduced exactly the loss the RAG existed to prevent.
Live sources: don’t maintain an index you’ll have to rebuild
There’s a second class of corpus where the reflexive move is also wrong, and it’s the one closest to home for most builders: a working set of files that changes — a dev folder, a notes directory, anything under active edit. The instinct absorbed from every code assistant is “build a vector index over it.” I’d push back on that too, and the reasoning is a cousin of the amortization argument.
A maintained vector index over live source is a standing liability. The source churns, so the embeddings rot, so you need change-detection and re-embedding machinery to keep them current — the sort of Merkle-tree-of-file-hashes, re-embed-only-what-moved plumbing the commercial code indexers had to build precisely because keeping a vector index synced to a moving codebase is a real and ongoing cost.4 You’re paying, continuously, to keep a derived artifact in step with a source that won’t hold still. For live material, that’s the wrong trade.
Two things beat it. For most questions, search at query time instead of maintaining an index: ripgrep and BM25 run over the current bytes on disk, so they’re never stale, they need no upkeep, and BM25 in particular is now fast enough that “just search when you ask” is a serious answer rather than a fallback.5 It’s telling that at least one frontier coding agent deliberately ships no RAG index at all, compensating with live agentic exploration — grep, read, follow — and eating a bit more token cost in exchange for never maintaining a stale index.4 Query-time search is the ripgrep end of the same ladder.
And if a corpus genuinely grows large enough that you want embeddings for semantic reach, embed the right layer. Karpathy landed in the same place building his own LLM-maintained wiki: he keeps a plain markdown index that works fine up to a few hundred pages on nothing but the file itself, and only past that threshold does he reach for qmd — hybrid BM25-plus-vector search with LLM re-ranking — run over the compiled wiki pages, not the raw sources.6 That’s the move worth stealing: when you finally embed, embed the thin, synthesized, slow-moving atlas, because it’s small and it ages well — not the churning raw material, which is large, stale on arrival, and expensive to keep current. Embed what compounds; search what churns.
The decision, distilled
- Substrate is a per-corpus decision. Run the rule — query shape × reuse frequency × loss tolerance — on each corpus, not once for your whole stack. Two corpora you own can correctly land on two substrates.
- Recall wants RAG. “Find every way people solved X” is single-hop, exhaustive, and hostile to synthesis. If completeness is the product, a vector store is the right tool, not a legacy one.
- Synthesis that deletes the variants is a feature or a catastrophe, depending on the query. Loss tolerance is the dial. High tolerance → compile. Low tolerance → leave it raw.
- Reuse is the economic gate. Wiki ingestion is capex — expensive synthetic augmentation paid once per source. It only wins where the same understanding is read often enough to amortize it. Daily triage, yes. Occasional lookup, never.
- Stratify, don’t migrate. Keep the RAG as the raw layer; grow a thin wiki atlas on top where only recurring themes earn a page; make each page route down into the RAG. The index is the data; the RAG is the digits.
- An atlas page indexes; it never absorbs. Store the shape of the design space and the query that fetches the specifics — never the compacted answer. The day a page swallows the variants, you’ve migrated by accident.
- For live sources, search don’t index. ripgrep and BM25 at query time beat a vector index you have to keep re-embedding. If you must embed, embed the compiled wiki pages once past a few hundred — the layer that’s small and ages well — not the raw material.
Run the rule on one of your own corpora
Pick a corpus you were about to “wiki-fy” — a scanner, a dev folder, a research pile — and answer three questions honestly: is the typical query recall or synthesis, how often is the same understanding reused, and would compacting it destroy what you came for? If it’s recall-shaped, rarely reused, and loss-intolerant, leave it a RAG and grow a thin atlas on top instead. Tell me what you found — I read every reply, and the boundary cases are the interesting ones.
References
- [1]Upstash Engineering. “Building a RAG Chatbot over Wikipedia” (JVector / DiskANN search heuristic). — The approximate graph search “concludes, assuming the current results are sufficiently good… it may mean some high-quality results in the graph remain unexplored,” an effect “more apparent with lower top-K values requested… before exploring parts of the index graph that might be more relevant to the query.” upstash.com/blog
- [2]Code-retrieval chunking & hybrid-search benchmarks (as aggregated in the source research). — “AST chunking achieves 70.1% Recall@5 vs 42.4% for fixed-size”; “Hybrid search improves recall 15–30% over single-method retrieval”; “Quality peaks at 5–10 chunks; degrades above 2500 tokens/chunk.” See also Aider’s repo map (AST graph + PageRank, no embeddings), aider.chat/docs/repomap.html
- [3]DeepWiki / Cognition AI. — “They indexed over 50,000 repositories among the most popular on GitHub, analyzed 4 billion lines of code, and spent about $300,000 in computing power just for this indexing phase.” “Wikis are regenerated on a schedule, not on every commit.” deepwiki.com
- [4]Cursor codebase indexing & the no-RAG alternative. — “A Merkle tree of file hashes is computed locally and synchronized… small edits change only the hashes of the edited file… enabling efficient differential updates”; “Creating embeddings is the expensive step, which is why Cursor does it asynchronously.” Contrast: “Claude Code: deliberately no RAG, compensates with agentic exploration (~40% more tokens).” (source research digest)
- [5]BM25S. — Pure numpy/scipy BM25 implementation reported as “500x faster than rank_bm25,” making keyword search at query time a first-class option over maintaining a vector index. github.com/xhluca/bm25s
- [6]Andrej Karpathy. “LLM Wiki” (gist, April 2026). — index.md is a “content-oriented catalog… works well up to a few hundred pages without needing vector search”; “qmd: local search engine for markdown files with hybrid BM25/vector search and LLM re-ranking… useful once your wiki grows past a few hundred pages” — run over the wiki, not the raw sources. gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
Discover more from Leverage AI for your business
Subscribe to get the latest posts sent to your email.
Previous Post
Voice AI's Fork: Conversation Companies vs Authority Companies