A LeverageAI Field Guide · Agentic Engineering

Designing Loops, Not Prompts

A Field Guide to Agentic Loops — and Who Holds the State Machine

The unit of AI engineering moved from the prompt to the loop. Everyone is sorting loops by what triggers them. That is the easy axis.

The axis that actually predicts whether a loop is worth running is who holds its state machine — and the loops that compound keep their state outside any single agent.

After this field guide you can:

✓ Classify any agentic loop on two axes — trigger and who owns its state
✓ Recognise the phase transition: loops that write their own sub-loops
✓ Apply the compounding test — accumulate, or become the substrate
✓ Run a five-minute self-test that tells an asset from an activity

Scott Farrell · LeverageAI · For builders of compounding-AI-research systems

Part I · The Two Axes

The Memo That Didn't Reach You

The unit of AI engineering moved from the prompt to the loop. That part is settled. The live question is which loop to build — and the answer needs a second axis nobody is talking about.

On 7 June 2026, Peter Steinberger — the creator of OpenClaw — posted two sentences that the entire AI-coding timeline then spent a week arguing about. No diagram. No repo link. It landed at around six and a half million views.¹

“Here’s your monthly reminder that you shouldn’t be prompting coding agents anymore. You should be designing loops that prompt your agents.”

— Peter Steinberger (@steipete), June 2026

It ricocheted because it named something everyone could feel but nobody had crisply stated. It was not a new tool. It was a new posture — and the posture it replaced had been universal for two years.

Steinberger was not alone, which is part of why the line stuck. At Acquired Unplugged in June 2026, Boris Cherny — head of Claude Code at Anthropic — made the same point: that he no longer prompts the model directly, that his job now is to write the loops that prompt it.² Google’s Addy Osmani gave the wave a name: loop engineering.³

The posture that’s ending

For two years, getting something out of a coding agent meant the same dance. You wrote a good prompt. You fed it enough context. You read what came back. You typed the next thing. The agent was a tool and you held it the entire time, one turn after the other. Put plainly: you were the loop.

If you are still narrating every step to an agent like you are dictating to a secretary, that is the thing the loop replaces. The prompt is the instruction. The loop is the machine that keeps re-issuing and re-steering it — a small system that finds the work, hands it out, checks it, records what was done, and decides the next thing, while you watch instead of type.

The Live Question

The reframe is now consensus. Everyone with a strong opinion agrees you should be designing loops. The interesting question lives one level down, and almost nobody answers it cleanly: once you are designing loops, what kinds are there, and which one should you build?

The standard answer is a taxonomy by trigger — manual, scheduled, event-driven, agent-initiated. That taxonomy is correct. It is also not enough. It tells you what starts a loop, and starting is the cheap part.

The two axes of every loop

• Axis 1 — the trigger. Manual, cron, event, agent-initiated. The axis everyone already uses. Necessary, and cheap.
• Axis 2 — who holds the state machine. Your head, fixed code you wrote ahead of time, or a durable external medium any agent can read and write. The axis that actually predicts whether a loop is worth running.

This book gives you both axes, then a single dial that unifies them — Karpathy’s autonomy slider — and finally a five-minute test you can run on a loop you operate today. Part I installs the two axes and the slider. Part II dissects the frontier loop, the one that writes its own sub-loops at runtime. Part III points the whole doctrine at your work: the test that tells an accumulating loop from a compounding one, the spectacle worth ignoring, three loops you would actually build, and the verdict on whether your loop is an asset or just an activity.

Design the loop, by all means. But the part of the design that matters is not the part the timeline is arguing about. It is the second axis. Let’s start where everyone else does — with the trigger — and then go where they stop.

Part I · The Two Axes

Four Loops, Sorted by What Starts Them

The trigger taxonomy, done honestly and completely. It is the right place to start. It is also the cheap axis — and the chapter ends by saying so.

The cleanest field demonstration of the loop taxonomy in the wild comes from Theo Browne (t3.gg), who has been running these loops hard and filming the results. A word of warning we will pay off later: Theo also burns a spectacular amount of inference doing it, and some of the spectacle is theatre. Cite him for the architecture, not the intensity. We factor the intensity out deliberately in Chapter 9.

Sorted by what starts a loop, there are four types.

Type 1 — The Self-Paced Linear Loop

One thread, one task, a stopping condition, and an “are we done? no? keep going” check at the end of every turn. This is the Ralph loop, coined by Geoffrey Huntley in July 2025 and almost insultingly simple in its purest form:

the canonical Ralph loop

while :; do cat PROMPT.md | claude-code ; done

Huntley describes it as “a bash loop that feeds an AI’s output (errors and all) back into itself until it dreams up the correct answer… brute force meets persistence,” and, more memorably, “deterministically bad in an undeterministic world.”⁵

The structure is fixed; only the content changes each iteration. It is cheap to reason about and easy to steer. But it cannot restructure itself — it grinds toward a goal along a predefined path. And the failure mode is exactly what you would predict: on a long run, the error rate climbs, because nothing external is checking the work mid-flight. Hold that failure; it is the thread that runs through the whole book.

Type 2 — The Scheduled Loop

Same machinery, but the trigger is a clock. The interesting property is not the timing — it is that the loop is decoupled from your attention. You go to sleep; you wake up to a pull request. Steinberger’s own “simple loop” is the canonical shape: tell the agent to maintain your repos, wake every five minutes, and direct work to threads — routed through an orchestrator skill so some work lands autonomously.¹

Type 3 — The Event-Triggered Loop

Now the trigger is external state changing in the world: a review comment lands, a CI check goes red, a new PR head appears. Theo’s sharpest pattern here is two agents unaware of each other on one PR — one produces, one reviews, and the PR itself is the shared blackboard between them. Neither holds the other’s state.

This is more robust than the linear loop precisely because the coordination medium is durable and external. If either agent dies, the state survives on the PR. That sentence is doing more work than it looks like — it is the entire proof of the second axis, and the next chapter is built on it.

Type 4 — The Agentically-Initiated Dynamic Loop

Here the agent itself starts a loop, based on what it is seeing — doing deeper analysis and spawning its own sub-loops. This is the frontier, and it deserves its own treatment, because it is not “Type 1 but fancier.” It is a different kind of object. We name it here and slow down on why it is categorically different in Part II.

The lineage everyone is rediscovering

ReAct (2022) → AutoGPT (2023) → Ralph (2025) → /goal (spring 2026) → orchestration loops (now).⁶ Single-agent Ralph is old hat; multi-agent supervision is the new layer. The discourse keeps reconstructing this ladder. The four types above are where it has currently arrived.

The Cheap Axis

Type	Trigger	In one line
1. Linear (Ralph, `/goal`)	You / a stop condition	Grinds toward a goal; error climbs with nothing checking mid-flight.
2. Scheduled (cron)	A clock	Decoupled from your attention; often infrastructure for other loops.
3. Event-triggered	External state change	The PR is the shared blackboard; state survives any one agent.
4. Agent-initiated	The agent itself	Writes its own sub-loops at runtime — a different kind of object (Part II).

That is the trigger taxonomy, complete and honest. It tells you when a loop fires. And starting is the cheap part. The taxonomy everyone reaches for is necessary, but it is not sufficient — it will not tell you whether the loop you have built will survive contact with a long run. For that, you need the axis underneath it: who holds the state machine.

Part I · The Two Axes

Who Holds the State Machine

The trigger tells you when a loop fires. This axis tells you whether it survives. It is the reframe the whole book turns on.

Starting a loop is cheap. The axis that determines whether a loop is worth running is a different question entirely: who holds its state machine? Where does “where are we, what is done, what is next” actually live between turns?

There are three answers, and they are not arbitrary — they form a ladder of durability.

Where the state lives	What that means	If the agent dies mid-run…
In your head	You are the loop. You remember what is done and decide the next turn. The pre-loop-engineering default.	Nothing survives but your memory. It does not scale past your attention.
In fixed code you wrote ahead of time	The structure is hard-coded: phases, prompts, branches. Types 1–3 live here.	State the code persisted survives. Structure you did not anticipate cannot appear.
In a durable external medium	The PR, the wiki, a state file — a blackboard any agent can read and write.	The state survives any single agent. The medium is the memory.

This Is an Old Insight, Generalised

None of this is new for us; it is the spine of the long-running-agents work. The counter-intuitive finding there was that agents which run for ten hours do not stuff everything into context. They implement stateless workers plus stateful orchestration: the agent is stateless, the orchestration is stateful, and state persists externally.

So the temptation — give the agent more memory so it gets smarter over a long run — is precisely backwards. More memory inside the agent means more historical cruft, more attention diffusion, faster degradation. The agents that stay sharp at hour ten are the ones that carry almost nothing forward in context and read their state back from somewhere durable.

Why Type 3 Was Robust

Now we can pay off the sentence we left hanging in Chapter 2. The two-agents-on-a-PR pattern is robust because the state machine is held by the PR, not by either agent. The producer does not hold the reviewer’s state; the reviewer does not hold the producer’s. Both read and write a durable, external blackboard.

This is the principle worth tattooing on the inside of your eyelids: durability beats coordination. You do not make a multi-agent loop reliable by making the agents coordinate better — better handshakes, richer message-passing, smarter shared context. You make it reliable by putting the state somewhere that survives any one of them. Coordination is brittle because it lives between agents that can each die. A blackboard is robust because it lives outside all of them.

The trigger tells you when a loop fires. Who holds the state machine tells you whether it survives. Optimise the second one.

Mini-case: pull the plug

Two loops, same model, same prompt. Loop A keeps “where we are” inside one agent’s running context. Loop B writes it to a state file (or a PR) after every step. Now pull the plug halfway through. Loop A starts cold tomorrow — the work is gone. Loop B’s fresh agent reads the file and resumes from the last checkpoint. The only difference between a loop that survives a crash and one that does not is where the state lived.

This is the axis. Everything in the rest of the book is a consequence of taking it seriously — the autonomy slider in the next chapter is just this ladder rotated into a dial, and the durability argument returns, fused with its twin, in Chapter 6.

Part I · The Two Axes

One Dial, Not a List

Sort the four loop types by how much of the state machine you hand over, and the flat list becomes a spectrum. Andrej Karpathy already named the spectrum.

Once you stop sorting loops only by what triggers them and start sorting them by how much of the state machine you have handed to the agent, the flat list of four collapses into something more useful: a single dial. And the dial is not ours to coin. Karpathy named it.

The Autonomy Slider

In his June 2025 talk “Software Is Changing (Again),” delivered at Y Combinator’s AI Startup School, Karpathy described an autonomy slider — drawn from his years on Tesla Autopilot, where, as he put it, “over the course of my tenure there, we did more and more autonomous tasks for the user.”⁷ The prescription:

“Depending on the complexity of the task at hand, you can tune the amount of autonomy that you’re willing to give up for that task.”

— Andrej Karpathy, “Software Is Changing (Again),” June 2025

His image for it is the Iron Man suit — “it’s both an augmentation and Tony Stark can drive it. And it’s also an agent… this is the autonomy slider. We can build augmentations or we can build agents.” And critically, the talk does not end on “max autonomy.” It ends on the opposite. Karpathy warns that builders are getting “way too excited” about fully autonomous agents, and prescribes a leash:

That is the whole correction this chapter exists to deliver, so absorb it before the table: the slider is a dial with a deliberate low end. The beginner reaches for maximum autonomy. The expert chooses the notch. Where you set it is a design decision, made per loop, per task — not a default you inherit from the hype.

The Four Types as Four Notches

Map the four loop types onto the slider and the taxonomy lifts from a list into a setting you choose:

Loop type	Trigger	Who holds the state machine	Slider position
1. Linear (Ralph, `/goal`)	You / a stop condition	Fixed code; a state file	Low — supervised; you hold the leash
2. Scheduled (cron)	A clock	Fixed code, decoupled from your attention	Low–mid — runs without you; structure still yours
3. Event-triggered	External state change	A durable external blackboard (the PR)	Mid — agents act; the medium holds state
4. Agent-initiated dynamic	The agent itself	The agent — it writes the state machine at runtime	High — self-structuring; the leash matters most here

Three Authorities, Three Distinct Jobs

This is the synthesis the whole field guide rests on, and it is worth keeping the three contributions clean rather than blurring them into one “loops are good” mush:

Peter Steinberger — the reframe

Stop prompting agents; design the loops that prompt them. The on-ramp (Chapter 1).

Theo Browne — the demonstrated taxonomy

The four types run in the wild, and the type-4 loop that writes its own sub-loops (Chapters 2 and 5).

Andrej Karpathy — the slider

The dial that turns the four types into a spectrum of handed-over control — and the leash that keeps it from running off.

Karpathy frames the larger shift as Software 3.0: “LLMs are a new kind of computer, and you program them in English.”⁸ The loop is how you program in English at scale — without holding the tool every turn. But “at scale” is not the same as “all the way to the right.” The slider has a low end for a reason, and the next two chapters are about what you put at that end so you can safely move the dial up.

Part II · A Loop That Writes Loops

The Phase Transition

Three of the four loops are loops you designed. The fourth designs its own sub-loops at runtime. That is not “more advanced.” It is a change of category.

Let us be precise about what makes Type 4 different, because “more advanced” is the lazy answer and it hides the actual insight.

Types 1 through 3 are loops you designed. You set the phases, the prompts, the branches. The structure is fixed; only the content changes turn to turn. Type 4 is a loop that designs its own sub-loops at runtime. On the slider, it is not a quantitative jump — it is the point on the dial where the agent stops running your state machine and starts writing one.

The Flagship, Dissected

Theo’s concrete instance is the one to keep in your head. In a PR-audit workflow, the agent did not call a workflow feature. It wrote roughly 240 lines of throwaway JavaScript that defined its own phases, schemas, prompts-as-functions, and a pipeline — and then that code orchestrated the sub-agents through it. As he described the moment: he asked the model whether he could make this loop, and it made a loop that makes sub-loops dynamically.

“You can never be more dynamic than code… when the agent can write code, it is effectively building its own custom feature every time.”

— Theo Browne (t3.gg)

The Inversion

Read that line twice, because it inverts how almost everyone thinks about code in agent systems. Code is usually the agent’s output — the thing it produces and you ship. Here, code is the step between model runs.

Letting the agent write the orchestration means the shape of the loop matches the shape of the problem — including problems you never anticipated when you built the harness. This is the agile instinct — build your own alternative shape around the problem — made literal, and made to happen at runtime rather than at design time. The author of a fixed primitive is guessing, in advance, at every situation the loop will meet. The agent writing code is not guessing; it is looking at the actual situation and reaching for the right control structure, the way an engineer would — except it reaches per problem, every time.

We have argued the same move from the other direction. The most profound capability of this era is not that agents write code; it is that they write code that writes code, build tools that build better tools. Type 4 is that idea pointed at the loop itself.

The Cost, Stated Plainly

This is not free, and pretending otherwise would be dishonest. Letting the agent write its own orchestration is non-deterministic — Theo notes the model sometimes wrote invalid JavaScript while pushing limits. It is more expensive. And you lose legibility: you can read your own pipeline, but you cannot fully predict the one the agent will write tonight.

When the trade is wrong — and when it is exactly right

✗ Regulated production code

• You need repeatability, audit, and a pipeline you can read
• Non-determinism is a liability, not a feature

Use a fixed primitive. The legibility is the point.

✓ Research and exploration

• You want the agent to surface structure you did not know to look for
• You are not optimising for repeatability

Let it write the loop. You are optimising for the agent discovering structure you didn’t anticipate.

For your context — AI research, building things that build things — that second column is the whole game. You are not paying for a pipeline you can certify; you are paying for a shape you could not have specified in advance.

Part II · A Loop That Writes Loops

Something That Can Say No

The single best line in the entire loop-engineering discourse was a reply to a tweet. It is the leash, stated operationally.

Under Steinberger’s tweet, in among the thousand replies arguing about what it meant, one landed harder than the original:

“Designing the loop is half of it. The other half is putting something in the loop that can say no: a test, a type check, a real error. A loop with nothing to push back is the agent agreeing with itself on repeat.”

— @mosyaseen, replying to @steipete

That is Karpathy’s leash, made concrete. And it explains the failure mode we have been carrying since Chapter 2: the Ralph loop’s error rate climbs on long runs because nothing external pushes back mid-flight. The loop runs out of friction, and a loop with no friction does not converge on reality — it converges on its own reflection.

The Grader Is Not the Worker

Anthropic’s supported /goal ships a closer as a first-class feature. After each turn, a separate fast evaluator model judges whether the success condition holds; the model doing the work is not the model grading it.⁶ That separation is the entire point. A loop grading its own homework is not autonomy. It is the agent agreeing with itself, with extra steps. Hope is not a closer.

The Closer and the Medium Are the Same Object

Here is the chapter’s real contribution, and it unifies the two halves of Part I. The thing that can say no and the durable external medium are usually the same object.

Two readings of one object

The PR

• Is the shared blackboard that survives an agent dying (Chapter 3)
• And is the place a red CI check can say no

The wiki

• Is the external state the next loop reads from (Chapter 8)
• And is the place a bad consolidation gets reverted in a diff

So “durability beats coordination” (Chapter 3) and “something that can say no” are the same insight from two sides. The durable external medium is simultaneously what survives an agent dying and what gives a second, adversarial process something to push back against. State held inside one agent can do neither — it cannot survive the agent, and it cannot be independently checked. The medium is the leash.

Mini-scenario: two closers

A research loop proposes “X supersedes Y” and writes it to the wiki. Closer A — the same model, asked “does this look right?” — rubber-stamps it; of course it does, it just wrote it. Closer B — a separate check asking “does Y still appear as a live dependency anywhere in the corpus?” — catches that Y is still load-bearing, and blocks the merge. Closer B is the only one that can actually say no, because it checks a different thing, against real state.

So design the loop, and design the no. A high-autonomy loop is only as trustworthy as the cheapest external thing that can stop it — and the cheapest such thing is usually already sitting in your durable medium, waiting to be wired up.

Part II · A Loop That Writes Loops

The Minimum Viable Leash

Everything so far has been the argument. This is the build — the leash you can rig tomorrow, with no new tools, that puts the state machine somewhere durable.

Chapters 3 through 6 made the case: put the state machine in a durable external medium, and give the loop something that can say no. This chapter is how. It is deliberately unglamorous, and it works tomorrow morning with the tools you already have.

The Four Moves

1. External state persistence

Get progress out of the context window into a file. A state.md tracking what is done, what is next, decisions made, and learnings. The agent reads it at session start and updates it as work completes.

Implementation: about five minutes.

2. Explicit completion criteria

Verifiable conditions, not “feels done.” For code: tests pass and lint clean and PR description written. For research: core claim supported by three or more sources and counter-evidence addressed and synthesis documented, not summarised.

3. Checkpoint discipline

After each significant step, stop and compress: what is done, what was learned, what is next. A fresh agent can pick up from the checkpoint without re-reading the whole history.

4. Context hygiene

Be aggressive about what enters the prompt; evict cold data. Each time you are about to paste something in, ask: “Does this need to be here?”

These four practices, implemented by hand, are documented to extend a productive session from roughly one hour to three or four with no new tooling.

They Are Not a Chore. They Are the Relocation.

This is the point most people miss, because the four moves look like hygiene chores and chores are easy to skip. They are not chores. They are the difference between a loop whose state machine lives in a place that survives a crash and one whose state machine lives in a context window that does not. Do them, and a Ralph loop stops being a thing that drifts and becomes a thing that holds — a crash is survivable, because the fresh agent reads the checkpoint and resumes; quality holds, because the criteria, not the vibes, decide when it is done.

Mini-case: same loop, different organisation

Loop A — Ralph, no external state

• Drifts around hour one
• Error rate climbs
• A crash loses everything

Loop B — the same loop, four moves

• A crash is survivable; the next agent resumes from the checkpoint
• Quality holds because criteria, not vibes, decide “done”
• Same model. Different organisation, different results.

Skills and Workflows: the Primitives That Keep It Compact

Two building blocks turn all of this from ad-hoc prompting into composable machinery. We treat them as known — this is not the place to re-teach the API — but their roles are worth naming because they map cleanly onto the two axes.

Skills are the reusable capability and memory layer: markdown plus optional executable scripts, discoverable, surviving across sessions. They encode “the way we investigate X.” The dynamic-context-injection trick — a command that runs before the skill loads — matters more for a research loop than for production code: a skill that injects the current wiki state as it loads is the compounding.

Workflows are the orchestration layer — the building block for breaking a fuzzy task into parallel investigations and synthesising them. In the field’s own shorthand, a loop is cron plus a decision-maker: the model picks the next action each tick, not a hardcoded branch.

Together they compose: skills are the verbs, workflows are the sentences, the loop is the paragraph. Re-deriving the same orchestration in every prompt is a tax you pay forever; named skills get cheaper over time, while ad-hoc prompts burn. That is the difference between building machinery and renting it one turn at a time.

The minimum viable leash — checklist

☑ External state in a file, not a context window
☑ Explicit, verifiable completion criteria
☑ A checkpoint after every meaningful step
☑ Ruthless context hygiene

Part III · The Same Doctrine, Pointed at Your Work

Accumulate, or Become the Substrate

For the builder whose loop produces knowledge rather than code, there is one test that sorts the loops worth building from the ones that only look busy.

Everything so far applies to any loop. This chapter is written for the reader the whole book is aimed at — the one building things that build things, where the loop’s product is not merged code but knowledge. For that reader, there is a single test that does the sorting.

The compounding test

• A loop that merely accumulates findings — writes entries you later read — is linear.
• A loop whose output becomes the substrate the next loop reads from — structured entries a future agent loads as priors to decide what to investigate next — is the one that genuinely compounds.

The loop’s real product is knowledge that makes the next loop better.

The Trap Is the Obvious Half

Most people building a knowledge loop build the accumulate version, because it is the obvious half: loops write entries, you read them, it grows. It also gets slower to think with as it grows, because an append-only pile is just the swamp with extra steps. A wiki that grows without getting smarter is not an asset. It is a knowledge graveyard.

The compounding version requires two things the accumulate version skips. Neither is optional.

Requirement One: Structure It to Be Read Back

The entries have to be structured enough to be read back as input — not prose, but claims and edges, so a future agent navigates the relationships instead of re-inferring them from similarity every time. That structure is the difference between a loop that runs and a loop that compounds. Karpathy independently named the artefact the “LLM Wiki” — a persistent, interlinked set of markdown files where the knowledge is “compiled once and then kept current, not re-derived on every query.”¹²

Requirement Two: A Step That Subtracts

The half almost everyone skips. A loop that only writes is a knowledge graveyard. The fix is a second agent whose job is to subtract: combine redundant claims, fade stale ones, fold flat facts into edges, spin clusters into their own pages. We call it the Janitor. It is reflection, running on your corpus — the same mechanism that kept long-horizon agents coherent in the Generative Agents research, where removing the reflection step degraded them into repetition.¹³

The index didn’t just get smaller. It got smarter with every janitorial pass.

If you build only one loop for your wiki, build the consolidation loop — not the ingestion loop. The ingestion is the obvious half; the Janitor is the half that makes it compound. Most people build the writer and skip the compactor, and then wonder why their knowledge base gets slower and dumber as it fills.

The Test as a Question

It comes down to one question about where your loop’s output goes. Does it land in a pile you read, or in a structured, self-compacting map that the next loop consults as priors before it decides what to do? Theo’s better workflows already did the compounding version without naming it — his PR audit loaded priors and clusters from prior runs. The first kind of loop is a scheduled sweep with a weak closer. The second is the machine you actually want.

A loop that runs leaves nothing behind. A loop that compounds leaves a sharper map — one that changes what the next loop is even able to ask.

Part III · The Same Doctrine, Pointed at Your Work

Strip the Theater, Keep the Apparatus

How to copy Theo’s architecture without copying his economics — because the dazzle and the asset are two different things.

We promised, back in Chapter 2, to factor out the spectacle. Here it is.

It would be easy to watch Theo’s demos and conclude that the lesson is scale — 55-agent tournaments, hundreds of dollars to pick between three PRs, “going hard,” thousands of dollars of inference in ten days. It is not. That intensity is a function of his temporary economics: he is optimising to burn heavily subsidised inference before it disappears, explicitly not for value-per-token. The 55-agent tournament to choose among three PRs is theatre.

What to Extract Instead

The thing to extract is the architecture: dynamic, self-structuring, artifact-producing loops with durable external state and an adversarial closer — everything Chapters 3, 5 and 6 built. Theo’s PR audit was good not because it ran a tournament but because it had a verify phase: an adversarial checker per ruling that tried to refute it against the real repo. Strip that phase and the tournament is just expensive agreement — the correlated-checkers pitfall from Chapter 6, at scale, on fire.

The structure that mattered — audit → rule → verify, with priors loaded from memory and an adversarial check at the end — works at a tenth the agent count. The agents were never the point. The phases were.

This is a meme our own work already metabolises. The temptation to read “whoever spends the most tokens wins” as a licence to burn is a misreading. The full claim is: whoever spends the most tokens wins — provided they are spending inside an apparatus that knows what to reject. The load-bearing word is discipline, not burn. The burn is not the wedge. The apparatus is the wedge.

The token-maxing is the theater. The apparatus is the asset.

Myth vs reality

✗ Myth

More agents and more tokens make a better loop. The dazzle is the value.

✓ Reality

A cron docs-sweep with a real test gate beats a dynamic 55-agent workflow scored only against itself. The verify phase, not the agent count, is what made the tournament worth anything.

Mini-case: two PR-audit loops

Loop A — the tournament

• 55 agents, no verify phase
• Scored against each other
• Result: $400 of agents agreeing

Loop B — the apparatus

• 5 agents, audit → rule → verify
• Each ruling refuted against the real repo; priors loaded from the last run
• Result: the same useful answer at a tenth the cost — and it left priors behind

Take the architecture. Leave the intensity. The recreational version of Theo’s setup is fun to watch and wrong to copy; the disciplined version of it is exactly the loop you want, pointed at your wiki, running quietly at a scale you can afford forever.

Part III · The Same Doctrine, Pointed at Your Work

Three Loops You’d Actually Build

Not three frameworks — one doctrine, pointed at three contexts. The same two axes and the same slider classify all of them.

The doctrine is portable, and this chapter proves it. Here are three loops worth building for compounding research. Each is the same two axes — trigger crossed with who-holds-the-state — pointed at a different context. For each, we ask the same four questions: name the trigger, name where state lives, name the closer, apply the compounding test.

Variant A — The compounding research wiki loop

Trigger: event or agent-initiated — a new source arrives, or the agent decides to investigate a gap.
State machine: the wiki-graph itself — the durable external medium, the third rung.
Closer: the Janitor consolidation as a revertible PR, governed by a North Star. A bad merge is a reverted commit, not a silent mutation.
Compounds? Yes — output is structured claims and edges the next loop loads as priors. This is your flagship use case.

Pitfall: hallucinated consolidation — a Janitor under a loose directive can merge two genuinely distinct ideas because they were old and adjacent. The closer must include chronological legibility and a revertible diff, not just a North Star.

Variant B — The overnight scheduled research sweep

Trigger: cron — a clock.
State machine: a state file plus the wiki.
Closer: explicit completion criteria plus a separate evaluator (Chapters 6 and 7).
Compounds? Only if a consolidation pass runs. Otherwise it is accumulate-only — a scheduled sweep with a weak closer that gets slower to think with as it grows.

Pitfall: the accumulate-only trap — the most common failure for a scheduled loop. You wake up to more entries, none of them compacted, and the pile thinks slower every week.

Variant C — The event-triggered multi-agent blackboard

Trigger: event — CI goes red, a review comment lands, a new artifact appears.
State machine: the PR or shared artifact — durable and external.
Closer: the CI / red check — durable and adversarial.
Compounds? Partially — it compounds if the loop writes its learnings back to the wiki (the Friday-deploy edge); otherwise the PR is durable but the learning evaporates with the merge.

Shape: two agents unaware of each other (Chapter 2) is the canonical form here — durability beats coordination (Chapter 3).

Variant	Trigger	Where state lives	Closer	Compounds?
A. Wiki loop	event / agent	the wiki-graph	revertible Janitor PR + North Star	Yes
B. Overnight sweep	cron	state file + wiki	criteria + separate evaluator	Only with a consolidation pass
C. Event blackboard	event	the PR / artifact	CI / red check	Only if learnings written back

Notice what the table is actually showing. The same map from Chapter 3 and the same slider from Chapter 4 classify all three loops. You do not need a new framework for each loop you build. You need to ask the same four questions of each one. The doctrine is the reusable thing; the loops are just the doctrine pointed somewhere.

Part III · The Same Doctrine, Pointed at Your Work

Activity or Asset?

The whole book has been building one test you can run in five minutes on a loop you already operate. Here it is.

Take one loop you run today. Walk it through these five steps. It costs you five minutes and it will tell you whether you have built an asset or an activity.

The five-step field guide

1. Name its trigger.

Manual, clock, event, or agent-initiated? The cheap axis — answer it and move on. (Chapter 2)

2. Name where its state lives.

Your head, fixed code you wrote ahead of time, or a durable external medium? The axis that matters. (Chapter 3)

3. Kill it mid-run, in your imagination.

If the agent died right now, does the state survive? If “no,” the loop is held together by attention, not architecture. (Chapters 3, 7)

4. Find the thing that can say no.

A test, a separate evaluator, a red CI check, a revertible diff — or is the loop grading its own homework? (Chapter 6)

5. Apply the compounding test.

Does the output land in a pile you read, or become a structured prior the next loop reads from? If it only accumulates, you built the obvious half. (Chapter 8)

The verdict

If you cannot answer 2 through 5, the loop is not yet an asset. It is an activity. Activity feels like progress and leaves nothing behind. An asset is a loop whose spend leaves a sharper map.

The Whole Arc, in Three Sentences

The reframe Steinberger named — from prompting agents to designing the loops that prompt them — only pays off when three things are true. The loop’s state machine lives somewhere durable (Chapter 3). Something external can say no (Chapter 6). And the output compounds into the substrate the next run reads from (Chapter 8). Get those three right and you can leave the leash long — the high end of Karpathy’s slider — because the loop is holding itself.

Get them wrong, and no amount of autonomy, agents, or tokens will save you from a loop agreeing with itself on repeat. Designing the loop is just procrastination with better posture if there is no closer at the end of it and no asset left behind.

The one line to carry around

Design the loop. But spend most of your design budget on the part the discourse keeps skipping — not what starts it, but who holds its state machine, and whether the spend leaves an asset behind.

Read The Index Is the Data for the substrate these loops feed. Then run the five-step test on one real loop this week, and reclassify it.

REF

Sources & Evidence

References & Sources

The evidence base behind every claim — primary research, industry analysis, and technical specifications

Research Methodology

This ebook draws on primary research from standards bodies, independent research firms, enterprise technology vendors, and consulting firms. Statistics cited throughout have been cross-referenced against primary sources.

Frameworks and interpretive analysis developed by Scott Farrell / LeverageAI are listed separately below — these represent the practitioner lens through which external research is interpreted, and are not cited inline to avoid self-promotional appearance.

LinkedIn Commentary

Peter Steinberger (@steipete) — Designing loops that prompt your agents [1]

His tweet reached ~6.5M views; the spark for the loop-engineering discourse

https://x.com/steipete/status/2063697162748260627

Boris Cherny (Head of Claude Code, Anthropic) — Acquired Unplugged, presented by WorkOS (June 2, 2026) [2]

"I don't prompt Claude anymore... My job is to write loops."

https://www.youtube.com/watch?v=RkQQ7WEor7w

Industry Analysis & Vendor Research

Addy Osmani — Loop Engineering [3]

Named the discipline of writing programs that prompt coding agents

https://addyosmani.com/blog/loop-engineering/

The Register — Ralph Wiggum loop prompts Claude to vibe-clone software [5]

Huntley coined Ralph (July 2025): a bash loop feeding output back until correct; brute force meets persistence

https://www.theregister.com/2026/01/27/ralph_wiggum_claude_loops/

The New Stack — Loop Engineering [6]

The loop lineage: ReAct to AutoGPT to Ralph to /goal to orchestration loops

https://thenewstack.io/loop-engineering/

Theo Browne (t3.gg) — Agentic loops (video) [10]

You can never be more dynamic than code; the agent writing code builds its own feature each time

https://www.youtube.com/watch?v=iJVJwmCKW9o

LeverageAI / Scott Farrell — Practitioner Frameworks

The interpretive frameworks, architectural patterns, and practitioner analysis in this ebook were developed through enterprise AI transformation consulting. The articles below are the underlying thinking behind those frameworks. They are listed here for transparency and further exploration — not cited inline, as this is the author's own analytical voice.

Scott Farrell, LeverageAI — Breaking the 1-Hour Barrier

Stateless workers + stateful orchestration; state persists externally; the 1-hour ceiling is an architecture choice

https://leverageai.com.au/breaking-the-1-hour-barrier-ai-agents-that-build-understanding-over-10-hours/

Scott Farrell, LeverageAI — The Enterprise AI Spectrum

Seven-level autonomy ladder; Levels 5-6 agentic loops, Level 7 self-extending

https://leverageai.com.au/the-enterprise-ai-spectrum-a-systematic-approach-to-durable-roi/

Scott Farrell, LeverageAI — The Agent Token Manifesto

Self-improving loops: agents write code that writes code, build tools that build better tools

https://leverageai.com.au/

Scott Farrell, LeverageAI — Context Engineering

Late binding, dynamic context injection; load just-in-time so a skill can inject current state as it loads

https://leverageai.com.au/context-engineering-why-building-ai-agents-feels-like-programming-on-a-vic-20-again/

Scott Farrell, LeverageAI — The Index Is the Data

Pre-digest the corpus into a wiki-graph of claims and edges; the index becomes the data

https://leverageai.com.au/the-index-is-the-data-how-a-self-cleaning-wiki-graph-out-thinks-rag/

Scott Farrell, LeverageAI — The Cognition Dimension Ladder

Disciplined cognition: spend tokens inside an apparatus that knows what to reject; discipline not burn

https://leverageai.com.au/the-cognition-dimension-ladder-why-your-ai-strategy-is-one-rung-too-low/

Primary Research & Standards Bodies

Andrej Karpathy (YC AI Startup School, June 2025) — Software Is Changing (Again) [7]

The autonomy slider, drawn from Tesla Autopilot; tune autonomy per task

https://www.youtube.com/watch?v=LCEmiRjPEtQ

Andrej Karpathy — Software 3.0 announcement [8]

LLMs are a new kind of computer programmed in English (Software 3.0)

https://x.com/karpathy/status/1935518272667217925

Andrej Karpathy — Software Is Changing (Again) [9]

Tune the amount of autonomy you give up per task

https://www.ycombinator.com/library/MW-andrej-karpathy-software-is-changing-again

Andrej Karpathy (GitHub Gist, April 2026) — LLM Wiki [12]

The LLM incrementally builds a persistent interlinked markdown wiki; knowledge compiled once, kept current

https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

Park et al., UIST 2023 (arXiv:2304.03442) — Generative Agents: Interactive Simulacra of Human Behavior [13]

Reflection synthesises higher-order memories; removing it degrades long-horizon coherence

https://arxiv.org/abs/2304.03442

About This Reference List

Compiled June 2026. All URLs verified at time of compilation. Regulatory documents and standards specifications are subject to revision — check primary sources for the most current versions.

Some links to academic papers and vendor research may require free registration. Government and standards body publications are freely accessible.