Context Engineering: Why Building AI Agents Feels Like Programming on a VIC-20 Again

TL;DR

LLM context windows operate like virtual memory systems—paging information in and out based on task demands, not keeping everything loaded at once.
Old-school memory constraints (VIC-20’s 3.5KB RAM) taught programmers to write tight, efficient code. Today’s AI context limits demand the same discipline: clarity, modularity, and ruthless pruning.
Context isn’t just capacity—it’s attention. Cluttered context diffuses a model’s focus, degrading output quality even when token limits aren’t reached.

Introduction

I started programming on a VIC-20 with 3.5 kilobytes of RAM. You read that right—3.5KB. Not megabytes. Not gigabytes. Three and a half thousand bytes to hold your entire program, all your variables, and whatever you were trying to display on screen.

It was a beautiful tyranny. Every byte mattered. You learned to write tight loops, reuse memory buffers, precompute lookup tables, and sometimes even self-modify code mid-execution just to squeeze out a few more bytes. When I upgraded to a Commodore 64 with its luxurious 35KB of usable RAM, I felt like I’d been handed the keys to a supercomputer.

Fast forward four decades, and I’m working with AI agents powered by large language models. These systems have context windows measured in hundreds of thousands of tokens—roughly 250,000 tokens in many modern systems, which translates to something like 180,000 words of text. That’s orders of magnitude beyond anything we dreamed of in the 8-bit era.

And yet, building effective AI agents today feels remarkably like programming on that VIC-20 again.

Why? Because context isn’t free. Every token you load into an LLM’s context window imposes a cost—not just in computational resources, but in cognitive clarity. Just as the VIC-20 forced us to be surgical about memory usage, today’s AI systems are teaching us that context engineering—the art of deliberately managing what information lives in an agent’s “working memory”—is the new frontier of performance optimization.

Context as Virtual Memory: The New Paging System

In operating systems, virtual memory allows programs to behave as if they have unlimited RAM by paging data in and out of physical memory on demand. The OS maintains a page table, tracks what’s “hot” versus “cold,” and swaps intelligently to keep the CPU fed with what it needs.

Building AI agent systems today demands exactly the same discipline—but instead of managing bytes, we’re managing semantic information.

Think of the LLM’s context window as your physical RAM. It’s large, but finite. Everything else—your codebase documentation, your project briefs, your style guides, your tool definitions—is like virtual memory sitting on disk. The key is to page the right information in at the right moment, use it, then page it back out to make room for what comes next.

In practice, this means treating context like a carefully scheduled resource:

Late binding: Don’t load tools, schemas, or reference documents until the task explicitly requires them.
Modular inclusion: Store project knowledge in discrete markdown files (project.md, brand.md, slides.md) and pull them into context only when the current task contracts with them.
Ephemeral workspaces: Use sub-agents as isolated sandboxes that load their own narrow context, complete their work, emit a clean artifact, then vanish—taking all their scratchpad clutter with them.

This isn’t just an efficiency trick. It’s a cognitive discipline. When you treat context as a paging system, you’re forced to think clearly about dependencies, task boundaries, and what information is truly load-bearing versus what’s just noise.

The Tyranny of Small Spaces: What the VIC-20 Taught Me

When you’re working with 3.5KB of RAM, you learn lessons that stay with you forever:

Every byte tells a story. You don’t waste space on long variable names, verbose comments, or redundant data structures. You think hard about what absolutely must exist in memory at runtime, and everything else gets computed on the fly or stored implicitly in code structure.

Simplicity compounds. The tighter your inner loops, the more functionality you can fit in the same space. A ten-line function that does one thing well is worth its weight in gold. Complexity is expensive—not morally, but literally, in bytes consumed.

Clarity is survival. When memory is scarce, you can’t afford tangled state or confusing control flow. If you can’t trace exactly what’s in RAM at any moment, you’ll overwrite something critical and crash. Discipline wasn’t optional; it was the cost of making anything work.

When I moved to the Commodore 64, I had 10x the memory—but the habits didn’t go away. I kept writing lean code because I’d internalized that bloat is a choice, not a necessity. And I kept winning performance headroom because my programs were structurally efficient, not just lucky enough to fit.

Those lessons disappeared in the 64-bit computing era. Modern developers can afford to be sloppy. Most of us load entire libraries for a single function call, keep frameworks in memory “just in case,” and barely think about object lifecycle management. RAM is cheap. CPU cycles are plentiful. Garbage collectors clean up our mess.

But now, with LLMs, we’re back in that VIC-20 constraint space—just wearing different clothes.

Context Isn’t Just Capacity—It’s Attention

Here’s where the LLM constraint diverges from the VIC-20 in a crucial way: it’s not just about running out of room. It’s about staying focused.

When the VIC-20 ran out of RAM, it simply stopped working. You got an “OUT OF MEMORY” error, and that was that. The machine didn’t get stupider as you filled memory—it just hit a wall.

LLMs behave differently. They don’t crash when context fills up; they diffuse.

Every token in the context window participates in the model’s attention mechanism. When the agent reads your latest instruction, it’s simultaneously weighing that input against everything else in view—prior messages, tool definitions, code snippets, documentation fragments, conversation history. The more clutter sits in context, the more the model’s attention spreads thin. It’s like trying to hold a conversation in a room where fifty other people are whispering different conversations nearby. You don’t lose the ability to talk, but your focus degrades.

I’ve watched this happen in real time. When I had four Model Context Protocol (MCP) servers installed—each exposing tool schemas, descriptions, parameter definitions—they consumed nearly a quarter of my context window. Not because they were actively being used, but simply because they existed in scope. Every time the agent planned its next action, it had to evaluate and reject dozens of irrelevant tools before proceeding. That’s cognitive overhead, and it manifests as slower reasoning, vaguer outputs, and occasionally, self-imposed shortcuts where the model decides to “conserve tokens” even though it has plenty of headroom.

Once I cleared out the unused MCPs and tightened the context, the quality delta was immediate and dramatic. Responses sharpened. Reasoning chains got deeper. The agent stopped second-guessing itself about token budgets.

This is the hidden cost of context pollution: you’re not just wasting space—you’re actively dumbing down the system.

Context Hygiene: The Markdown Operating System

So how do you keep context clean without sacrificing capability?

The answer I’ve landed on is to treat the agent’s working environment like an operating system with a deliberate memory hierarchy. I call it the Markdown OS—a lightweight, file-based context management system where each .md file is a semantic module that gets paged in only when required.

Tier 0 (Live context): Current task notes, minimal tool schemas, and the immediate work artifact. This is your L1 cache—fast, hot, and tiny.

Tier 1 (Warm references): Headers and tables of contents for available modules. The agent can see what’s available without loading the full payload. Think of this as your page table.

Tier 2 (Cold storage): Full markdown bodies, historical logs, prior experiments, and completed work. These live on “disk” and are pulled in only when explicitly requested.

Each markdown module begins with a short header declaring its purpose, required inputs, allowed tools, and expected outputs. The agent reads headers first, decides if a module is relevant, then pulls the body only if needed. This creates a deterministic contract system: tasks know what they need, modules know what they provide, and the intersection happens just-in-time.

For example:

Writing a landing page? Mount offer.md and voice.md.
Generating slides? Add slides.md and brand.md.
Fetching stock images? Don’t pollute the main context at all—spin up a sub-agent with image_tools.md, let it do its work in isolation, and return only the final artifact.

This approach has several compounding benefits:

High signal density: Most tokens in the live context are directly about the current task, not “just in case” scaffolding.

Shorter planning loops: The agent spends less time evaluating irrelevant options (“not that tool, not that module…”) before taking action.

Predictable forgetting: Completed work leaves behind clean artifacts and a tiny receipt (inputs used, outputs produced), not conversational sludge.

Sub-Agents as Ephemeral Sandboxes

One of the most powerful patterns I’ve discovered is using sub-agents as disposable execution contexts—isolated workspaces that handle high-churn, low-global-context tasks.

Here’s the pattern:

Main agent identifies a task that’s self-contained but messy (e.g., generating images, pulling data, compiling slides).
Spin up a sub-agent with a minimal brief and exactly the markdown modules it needs—no history flood, no tool zoo.
Sub-agent completes its work, emits a structured artifact (a file, a JSON blob, a summary), and terminates.
Main agent receives only the artifact—none of the sub-agent’s trial-and-error, retries, API logs, or intermediate failures leak back into the parent context.

This is the semantic equivalent of running a subprocess with its own memory space. When the process exits, its RAM is reclaimed. Only designated outputs persist.

Why does this matter? Because messy tasks generate messy context. Image generation might involve retry logic, rate limiting, failed downloads, and iterative refinement. If all of that lives in your main context, it accumulates like sludge. By the time you’re ten tasks deep in a project, half your context is historical junk that the agent keeps re-weighing during attention.

Sub-agents solve this by making forgetting structural. The scratchpad is deleted by design. Only the final, validated output returns—clean, documented, ready to use.

Memory Tiers Over One Big Soup

The Markdown OS and sub-agent patterns are both expressions of a deeper principle: memory tiers beat monolithic context.

In the 64-bit computing era, we got lazy because flat, abundant memory made it easy to dump everything into one address space and let the OS sort it out. But high-performance systems—databases, game engines, real-time systems—never stopped thinking about cache hierarchies, working sets, and locality of reference. They know that treating all memory as equivalent is a lie. What you need right now should be close and fast. What you might need later should be accessible but out of the way.

AI agents benefit from exactly the same thinking. Instead of loading your entire knowledge base, tool registry, and project history into one context soup, architect deliberate tiers:

Working context: The handful of facts, tools, and constraints actively shaping the current decision.
Reference context: Indices, headers, and pointers you can traverse when you need deeper detail.
Archive context: Historical state you almost never touch but keep for auditability or rare edge cases.

Prompts should travel light across tiers. Inclusion should be explicit and reversible. And you should always know, at any moment, what’s live versus what’s cold.

Why Small Advances Compound Exponentially

One subtlety I’ve noticed: small improvements in first-pass correctness have exponential downstream effects.

When an AI agent gets something wrong on the first try, that wrong answer enters the context. Now the conversation includes not just the correct solution you’re trying to reach, but also the failed attempt, the correction, and the diff between the two. If it takes five tries to get something right, your context is now littered with four wrong answers and all the metadata around fixing them.

This isn’t just clutter—it’s cognitive interference. Every subsequent task must now attend to a context that’s partially contradictory. The model has to weigh correct state against historical errors, and that diffuses its reasoning.

By contrast, when the agent nails it on the first try—because the prompt was crisper, the context was leaner, or the model itself got smarter—the context stays clean. No pollutants. No correction loops. The next task starts from a position of clarity, which increases its odds of succeeding cleanly, which keeps the context clean for the task after that.

It’s a compounding effect. Correctness begets clarity begets correctness. I’ve felt this in practice: recent versions of AI coding tools feel dramatically more productive not just because individual responses are better, but because they don’t poison the well with failed attempts.

What “Good” Looks Like

After months of working this way, I have a felt sense of what well-engineered context looks like:

High signal density: If I were to read the live context as a human, nearly every sentence would be relevant to the current task. No filler, no “just in case” scaffolding, no zombie instructions from three tasks ago.

Low tool entropy: The agent sees a handful of commands appropriate to its current goal—not a shopping mall directory of every possible action.

Short planning loops: When the agent decides what to do next, it spends its reasoning budget on the task, not on negative checks (“not that tool, not that module…”) to exclude irrelevant options.

Artifacts over transcripts: Completed work manifests as files, diffs, structured data, and metrics—not long conversational threads that need to be re-parsed every time.

Predictable forgetting: When a task completes, it leaves behind exactly what’s needed (the artifact, a one-screen receipt) and nothing more. The ephemeral workspace vanishes.

These properties aren’t just aesthetic. They’re functional. They’re what allow an AI agent to stay sharp across long sessions, juggle multiple projects without confusion, and deliver consistent quality instead of regressing into vagueness as context accumulates.

The Commodore 64 Lesson: Discipline Scales

When I upgraded from the VIC-20 to the Commodore 64, I suddenly had ten times the memory. I could have gotten sloppy—loaded bigger libraries, written longer functions, stopped worrying about byte counts.

But I didn’t. I kept the discipline, and it paid off. My programs were faster, more stable, and easier to debug than the bloated alternatives my peers wrote. The habits forged in scarcity became advantages in abundance.

I see the same dynamic playing out with AI context windows. Yes, they’re getting bigger—some models now offer a million tokens or more. But that doesn’t mean we should abandon context hygiene. If anything, it means we should double down.

Larger windows don’t eliminate the attention problem; they just defer it. A model with a million-token context is still performing self-attention across that entire space. If 90% of it is noise, the model is wasting 90% of its cognitive budget on irrelevance. The solution isn’t to “wait for bigger models”—it’s to build systems that keep the signal-to-noise ratio high regardless of window size.

Discipline scales. The tighter your context engineering, the more intelligence you extract from each token budget. That’s true at 250K tokens, and it’ll still be true at 10 million.

Elegance as an Intelligence Multiplier

There’s a deeper lesson here about the nature of intelligence—both human and artificial.

We tend to think of intelligence as raw capacity: more memory, faster processing, bigger models. And yes, capacity matters. But in practice, clarity matters more.

A brilliant person with a cluttered workspace and a disorganized mind will underperform a moderately smart person who thinks in clean, modular structures. The same is true for AI systems. A smaller model with a pristine context will often outperform a larger model drowning in noise.

This is why the VIC-20 era mattered. Constraints forced us to find elegant solutions—not because elegance is virtuous, but because inelegance literally didn’t fit. We learned to factor problems cleanly, reuse patterns efficiently, and think in layers of abstraction that composed well.

Those same principles now apply to context engineering. When you’re forced to think carefully about what goes into context and when, you naturally develop better abstractions. You separate concerns more cleanly. You define interfaces more precisely. You build systems that are easier to reason about—not because you’re trying to be clever, but because bloat is expensive and clarity is survival.

In a strange way, the rise of LLMs has brought us back to fundamentals. Good engineering isn’t about having infinite resources; it’s about making deliberate, well-structured choices that allow intelligence—whether silicon or carbon-based—to do its best work.

Conclusion: Old Constraints, New Systems

We’ve come full circle. The tyranny of the VIC-20’s 3.5KB of RAM taught a generation of programmers to write lean, disciplined code. Then the 64-bit era made us lazy. Now, LLMs are reminding us that context isn’t free, attention is finite, and clarity compounds.

The difference is that we’re no longer optimizing bytes—we’re optimizing meaning. We’re building semantic virtual memory systems. We’re paging in knowledge modules just-in-time. We’re using sub-agents as ephemeral sandboxes that emit clean artifacts and then vanish. We’re treating context like a carefully scheduled resource, not a junk drawer.

And it’s working. Tight context produces sharper reasoning. Modular design makes systems more maintainable. Discipline scales as models grow.

The lesson is timeless: intelligence isn’t just about how much you can hold in memory—it’s about how clearly you can think with what you have. The VIC-20 taught us that once. AI agents are teaching us again.

As we build the next generation of AI systems, we’d do well to remember: elegance isn’t a luxury. In a world of limited attention and finite resources, it’s the ultimate intelligence multiplier.

And if you ever find yourself building with AI agents, think like you’re back on a VIC-20. Keep your context tight. Load only what you need. Make forgetting structural. Let clarity compound.

The old constraints are teaching us new ways to build smarter, leaner systems—one carefully managed token at a time.

What are your experiences with context management in AI systems? Have you found patterns that keep your agents sharp across long sessions? I’d love to hear what’s working for you.