AI Architecture · LeverageAI · Knowledge Assets
File Back the Walk
Every time an agent queries your knowledge base it produces two things worth keeping, and most systems keep neither. Karpathy files the answer back into the wiki as a new page, so explorations compound instead of evaporating. Keep the path too — the walk is telemetry that tells you which edges are missing, which roads are dead ends, and which pages have gone cold. Here is how to write both back without the wiki eating its own tail.
Scott Farrell · LeverageAI · A field note for people running an agentic wiki
The short version
- The move: a good query answer is a new page, not a throwaway message. Karpathy’s LLM Wiki files answers back so “a comparison you asked for, an analysis, a connection you discovered” stop disappearing into chat history.1 The community has already turned that into a
syntheses/folder.2 - The addition: a query produces two assets at different resolutions — the compiled answer and the raw path. File the answer into the wiki with an edge back; keep the path in the immutable raw layer. A query that files back is just an ingestion whose source is the system’s own exploration.
- The discipline: a filed answer is cache, not source. Give it a
derivedtype, rank it below source-backed claims, edge it to its supports so lint invalidates it when they change, and compact it first — or the wiki starts citing its own guesses back to itself. - The free audit: a zero-model pass over stored walk transcripts surfaces missing edges, dead ends, and cold pages — feeding the janitor’s queue. The map improves from being used, not just from being fed.
Right now, in the chat window you probably have open in another tab, an assistant is doing something quietly wasteful on your behalf. You ask it a question, it fires off a web search or reads across a dozen sources, it assembles a genuinely good hundred-page research package in its working memory — and then, when it answers, it throws almost all of that away. By the next turn the durable state is the conversation text, so those hundred fetched pages have compressed down to two or three citations and a paragraph, and the rest has evaporated. Ask a follow-up and it does the whole expensive gather again from scratch. The research was the costly part of the turn, and the research is exactly the part that gets binned. Andrej Karpathy noticed this too, building a personal wiki, and drew the obvious conclusion that almost nobody acts on: if the exploration is the expensive part, make the exploration the durable part.
The one-line version
A query isn’t consumption — it’s production. It yields a filable answer and a mineable path, and a system that discards both is paying to explore and then throwing the exploration away.
The answer is a page, not a message
Start with Karpathy’s move, because it’s the anchor and it’s deceptively simple. In his LLM Wiki setup, an agent maintains a structured, interlinked collection of markdown pages over your curated sources. When you query it, the agent “searches for relevant pages, reads them, and synthesizes an answer with citations.”1 So far that’s ordinary. The non-ordinary part is what he does with the answer:
Good answers can be filed back into the wiki as new pages. A comparison you asked for, an analysis, a connection you discovered — these are valuable and shouldn’t disappear into chat history.
— Andrej Karpathy, “LLM Wiki”
Read that against the chat-app behaviour and you can see exactly which artifact each system treats as durable. In the chat app, the conversation text is the state that survives and the research package is turn-scoped working memory — so the pages evaporate and the thin summary remains. Karpathy flips it: the conversation is the ephemeral thing and the map is what persists. The answer you just worked hard to synthesize doesn’t decay into scrollback; it becomes a page, sitting in the same graph as everything else, ready to be found and built on next time. Explorations compound instead of evaporating. The community has already run with this — one popular implementation of the pattern formalises it as a literal directory: alongside raw/ for sources and wiki/ for the persistent pages sits syntheses/, described in one line as “query answers filed back as wiki pages.”2
This is the same instinct behind our own argument that RAG was built for chatbots and agents need a wiki: retrieval that re-derives everything at query time never accumulates, while a wiki is a compounding artifact. Filing answers back is that principle turned on the queries themselves. Good so far. But there’s a second asset in every query that Karpathy’s move leaves on the floor — and if you’re running an agentic wiki rather than a single-shot one, it’s the more interesting of the two.
The answer and the path are two resolution layers of one event
Here is the piece that changes the design. When my query agent answers a question, it doesn’t do a single retrieval — it walks. It’s a cheap, fast model given the map and a North Star to answer from the map and descend only where warranted, and every tool call it makes — every page it opens, every edge it follows, every branch it checks and abandons — gets written into its conversation history. (That scout-then-senior machinery is its own story, told in The Scout and the Senior.) So at the end of a query I’m holding two artifacts, not one. There’s the answer — the compiled claim, cleaned up and cited. And there’s the path — the raw transcript of the whole walk that produced it.
Karpathy files the answer. I want to keep the path as well, and the reason is that they’re not the same information at lower resolution — they’re different information. The answer is what the walk concluded. The path is what the walk did: which pages turned out to be adjacent, which promising-looking region turned out to be a dead end, which corner of the map never got touched. A summary deletes all of that by design — and dead ends are informative. So the two artifacts want two different homes:
- The answer — when one is worth keeping — goes into the wiki as a new page, with an edge pointing back to the walk that produced it. It’s a compiled claim; it lives with the other claims.
- The path goes into the immutable raw-sources layer, exactly where a coding-session transcript sits beside the code it produced. It’s not a document you read; it’s source material you mine.
Once you see it that way, the whole thing collapses into machinery you already have. A query that files back is just an ingestion whose source is the system’s own exploration. The ingestion agent walks a source and writes pages; the query agent walks the map and writes an answer; they’re the same explorer with a different starting point and the same write tool at the end. The convergence is exact: the query agent’s senior finalizer, handed the write tool, is the ingestion senior. You don’t build a write-back pipeline — you notice you already built it, for ingestion, and point it at your own query traffic.
(This is the query-time cousin of a move I made recently for coding work in The Code Is the What; The Transcript Is the Why — there the raw asset is your agent-session transcript, distilled into a brief that records what you intended and rejected. Same instinct, different exhaust: that piece harvests your coding sessions; this one harvests the agent’s walks across its own wiki. Two transcript streams, one worldview.)
Type the derived pages, or the wiki eats its own tail
Now the danger, because filing answers back is a genuinely good idea with a genuinely bad failure mode, and the failure mode is worth being blunt about. There is a well-taken objection floating around the LLM-wiki discussion: don’t file answers back at all, because a synthesis contains no new information — it’s a rearrangement of pages already in the wiki, so all you’re doing is bloating the map with restatements of itself. Push that far enough and the wiki starts drowning in its own echoes: syntheses citing syntheses, the same claim reflected off six derived pages, and a lint pass that can no longer tell what the system actually knows from what it once said.
The objection is right about the risk and wrong about the remedy. The remedy isn’t “never file” — it’s type the derived pages honestly, so the map never mistakes a thing it computed for a thing it was told. A synthesis is a claim whose provenance is wiki-internal. That makes it cache, not source — and the schema has to say so:
The derived-page schema
derived — provenance is wiki-internal. This page was computed from other pages, not ingested from a source. It is cache.supports edges. If any supporting page is edited or retired, the derived page is flagged stale — recompute or drop. It cannot silently outlive its evidence.Notice what each field buys you. The type and rank together mean a synthesis can never win an argument against a source — the exact thing that stops the wiki eating its own tail. The supports edges — which you get for free, because the path already recorded which pages the walk read — turn every derived page into a lintable, invalidatable object rather than a permanent assertion. And marking them first-to-compact means the map self-cleans in the right order: when storage or coherence pressure builds, the regenerable restatements go before anything irreplaceable does. A filed answer is genuinely more useful than the raw walk that made it — it’s smaller, self-describing, and wired into the graph — but it is only ever allowed to be useful as cache. Get that one distinction into the schema and Karpathy’s move stops being a bloat risk and becomes a compounding one.
File it only if the walk was hard
Typing the pages keeps the ones you file honest. The next question is which ones to file at all, because “every answer becomes a page” is how you get the bloat even with good types — a map full of one-line lookups nobody needed persisted. There’s a clean triage signal sitting right there in the path, and it costs nothing to read: did the scout have to do genuine multi-hop work?
If the walker traversed five pages to connect something, the map lacked a shortcut — file it. If it was a one-page lookup, discard.
The logic is almost tautological once you say it out loud. A hard walk — five pages, several hops, a real synthesis across regions that weren’t previously linked — is direct evidence that the map didn’t already contain the connection the query needed. Filing the answer adds that connection, so the next reader gets it in one hop instead of five. That’s the map learning a shortcut it was missing. A one-page lookup is the opposite: the answer was already sitting on a single page, the walk was trivial, and filing a derived restatement of a page that already exists adds nothing but a page to maintain. So the heuristic isn’t just spam control — it’s a targeting function. It files precisely the answers that represent a gap the map had, and ignores precisely the ones that don’t. The walk difficulty is the signal for whether filing back is worth it.
Harvest the walk even when you bin the answer
Here’s the part that made me want to keep the path even for the queries I don’t file. Most walks are one-page lookups — by the triage rule above, their answers get discarded. But the walk itself is still telemetry about the map, and that telemetry is valuable whether or not the answer was. You don’t need a model to extract it. It’s a cheap, deterministic pass over the stored transcripts — pure shape, no comprehension — and it produces three signals, each of which is a work item for the janitor:
- Page-pairs repeatedly co-traversed → missing-edge candidates. If two pages keep showing up together in walk after walk, but there’s no edge between them, the walkers are telling you an edge should exist. They keep having to route the long way round to connect them. Add the edge and every future walk gets there directly.
- Regions entered and backed out of → dead-end log. When walks repeatedly descend into a branch and reverse out without finding what they needed, that’s a ruled-out path — and recording it means future walks inherit the ruling instead of re-earning it. This pain is real and widely felt: an open Claude Code feature request notes that when an agent tries approach X, hits a wall and pivots, “that hard-won knowledge isn’t captured anywhere — future sessions may repeat the same dead-end exploration.”3 That proposal reaches for a hand-written
DECISIONS.md; the telemetry pass derives the same dead-end log automatically, from walks the system was already doing. - Pages never visited across many walks → cold-page candidates. A page that never gets touched, no matter what people ask, is either mislabelled (readers can’t find it), genuinely useless (compact it), or missing its inbound edges (it’s orphaned). Any of the three is a janitor job.
Run that pass over a week of real query traffic and it prints something like this — not prose, just a work queue:
Telemetry pass — one week of query walks
voice-agent-latency ↔ openai-realtime-api — nine walks connected these by hand this week; no edge exists. Propose edge: relates-to. Filing the one hard walk that spanned them would place it.whisper-chaining subtree — six walks descended looking for a low-latency path, backed out each time. Log as ruled-out; annotate the region so future walks don’t re-descend.legacy-twilio-tts-notes — untouched across every walk this week. No inbound edges. Candidate: re-link or compact.Every line in that queue was generated by usage, not by ingestion, and none of it cost a single model token. That’s the deterministic half of the pendulum from Text Is the Model’s Home Turf applied to maintenance: cheap code reads the shape of the walks; the model is reserved for the janitor’s actual judgment calls about what to do with the queue. The negative space — the roads not taken, the pages not touched — turns out to be some of the most actionable signal the whole system produces, and it was being thrown away with every answer.
The map improves from being used, not just from being fed
Step back and look at what these two write-backs do together. Ingesting a source makes the map bigger. Querying the map — if you file the hard answers and mine every path — makes it better: the missing edges get added, the dead ends get marked, the cold pages get pruned or re-linked, and the genuinely new syntheses get filed as typed, rankable, compactable derived pages. Query-time signal becomes the janitor’s queue, which means querying the system improves the system. That’s the write-back loop that makes a knowledge base compound rather than decay — the property I argued was the whole point in The Index Is the Data, now with a third feeder. Sources feed it. Session transcripts feed it. And now its own usage feeds it — the traffic it was already carrying, harvested instead of discarded.
The economics are the ones that make the whole approach worth it, and they’re familiar from Context Arbitrage: pay a cheap model once to walk, capture the answer and the path, and every future query runs against a slightly better-connected, slightly cleaner map. The walk is a one-time cost; the improvement it deposits is permanent. Do this for a while and the map you query on Friday is measurably better-shaped than the one you queried on Monday, without a single new source going in — because a hundred people (or a hundred cron jobs) walked it in between, and every walk left a trace worth reading.
So the design principle is short enough to put on a sticky note, and it inverts how almost everyone treats a query. A query is not a read. It is a write in disguise — two writes, if you keep both resolutions of it. File back the answer when the walk was hard, typed as cache so it can never rot the map. Harvest the path always, deterministically, so the map audits itself from its own use. Karpathy gave us the first half: stop letting good answers evaporate into chat history. The second half is the one hiding in plain sight in every agentic wiki already running — the walk itself is an asset, and the map improves from being used, not just from being fed. Keep the walk.
Running an agentic knowledge base — and throwing its query traffic away?
Every walk your agents take across your wiki is a free audit of it: the missing edges, the dead ends, the cold pages, and the occasional synthesis worth filing back. At LeverageAI we build the self-improving map — scout-and-senior walks, typed derived pages that can’t rot the graph, and a deterministic telemetry pass that turns usage into the janitor’s queue. Talk to us about a knowledge base that gets better the more you use it.
References
- [1]Andrej Karpathy — “LLM Wiki” (gist, created 4 April 2026). Describes the Query operation — “You ask questions against the wiki. The LLM searches for relevant pages, reads them, and synthesizes an answer with citations” — and the write-back insight this article anchors on: “good answers can be filed back into the wiki as new pages. A comparison you asked for, an analysis, a connection you discovered — these are valuable and shouldn’t disappear into chat history.” The lint pass also flags “data gaps that could be filled with web search,” letting the wiki commission its own research. gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
- [2]llm-wiki-agent (SamurAIGPT) — a community implementation of the LLM-wiki pattern that formalises answer write-back as a directory: alongside
raw/(input) andwiki/(persistent pages) sitssyntheses/, documented in one line as “query answers filed back as wiki pages,” with the query operation described as “synthesize answer from wiki pages.” Thesyntheses/folder is exactly the artefact this article argues needs a type, not just a location. skillsllm.com/skill/llm-wiki-agent - [3]anthropics/claude-code issue #15222 — “[FEATURE] Decision History Tracking with DECISIONS.md.” Independent corroboration of the dead-end-log signal: “Backtrack patterns are invisible — When we try approach X, hit a wall, and pivot to Y, that hard-won knowledge isn’t captured anywhere. Future sessions (or future developers) may repeat the same dead-end exploration,” and “when Claude presents options A, B, and C, and we choose B, the reasoning for rejecting A and C evaporates.” The proposal reaches for a hand-written file (status states ACTIVE / REJECTED / BACKTRACKED / EXPLORING); the telemetry pass in this article derives the same log automatically from stored walk transcripts. github.com/anthropics/claude-code/issues/15222
- [4]LeverageAI — related canon (named for framing, not statistics): RAG Was Built for Chatbots, Agents Need a Wiki (why a wiki accumulates and retrieval doesn’t); The Scout and the Senior (the walk machinery the path telemetry runs over); The North Star Prompt (purpose over checklist — the query agent’s brief); The Code Is the What; The Transcript Is the Why (the sibling — harvesting agent-session transcripts, the other exhaust stream); Text Is the Model’s Home Turf (the deterministic/judgment pendulum — telemetry is the deterministic half); The Index Is the Data (the self-cleaning graph these write-backs feed); Context Arbitrage (compile once, query cheap); Why LLMs Can Walk a Wiki but Can’t Drive a RAG (why the walk is clean enough to mine). leverageai.com.au
Discover more from Leverage AI for your business
Subscribe to get the latest posts sent to your email.
Previous Post
The Code Is the What; The Transcript Is the Why