A Blueprint for Future Software Teams
A Practical Guide for Compounding Learning with AI
The model is frozen. The scaffolding learns.
Teams that build knowledge canons achieve compounding productivity gains.
What You'll Learn
- ✓ Why AI models don't learn from your conversations—and what does
- ✓ How to build organizational learning systems that compound over time
- ✓ The three-layer knowledge canon that turns insights into assets
- ✓ Design-first workflows that make code regenerable and ephemeral
- ✓ How to harvest exponential gains from every model upgrade
Scott Farrell
LeverageAI
The Frozen Model Paradox
You've been treating AI like it remembers you. It doesn't.
Every conversation you've had with Claude or ChatGPT? The model forgot it the moment you closed the tab. That brilliant prompt you crafted last month that finally got the AI to understand your architecture? Gone. The conventions you thought you were "teaching" it about your codebase? Never learned.
This isn't a bug. It's the fundamental architecture of how these systems work. And if you don't understand this, everything you're building on top of AI is on shaky ground.
Most teams are operating under a false mental model—one that leads them to waste time, miss opportunities, and fundamentally misunderstand how to build compounding capability with AI. This chapter tears down that mental model and replaces it with the truth. Once you see it, you can't unsee it. And everything about how you work with AI will change.
TL;DR
- • AI models don't learn from your conversations—weights are frozen between expensive 6-month training runs
- • Fine-tuning and RAG are technical plumbing, not organizational learning systems
- • The psychological journey: blame AI → blame prompt → realize it's the scaffolding around the model
- • Individual productivity with AI doesn't automatically create team capability—you need infrastructure for that
- • The right question: "How does my team learn when the AI can't?"
The Uncomfortable Truth About AI Learning
What People Think Happens
If you're like most developers using AI coding tools, you probably have a mental model that goes something like this:
- •"I've trained it to understand my coding style"
- •"It knows our codebase now—I've shown it our patterns"
- •"We've been teaching it our conventions over the last few weeks"
- •"The more I use it, the better it gets at helping me"
This feels true. Within a single session, the AI does use context from earlier in the conversation. It feels like it's "getting" you. It adapts its responses. It references things you said ten prompts ago. Marketing language from AI companies reinforces this impression with words like "learns," "adapts," and "improves."
But there's a critical distinction most people miss: context within a session is not learning. It's just short-term memory. And when that session ends, it's gone.
What Actually Happens
Here's the reality that most people don't understand:
AI models are frozen between training runs.
Let's break that down:
- •Training happens in AI labs—massive compute clusters running for weeks or months, costing millions of dollars in GPU time.
- •Your prompts don't modify the model—when you chat with Claude or ChatGPT, you're not changing its weights or teaching it anything permanent.
- •Other users' prompts don't affect your experience—the model you use today is identical to the one everyone else uses.
- •Between major releases, the model is completely static—GPT-4 to GPT-4o to GPT-5, each is a separate, expensive training run.
"AI doesn't learn like people think it learns. The models take six months to bake in an AI lab and a million dollars or something. So it can't really learn from chats in a quarter or chats from someone in person."— From the original conversations that inspired this book
The training cycle for frontier models looks like this:
- Months of preparation: Curating training data, building infrastructure, designing the architecture
- Weeks of training: Running massive parallel compute across thousands of GPUs
- Weeks of fine-tuning: RLHF (Reinforcement Learning from Human Feedback), safety training, alignment
- Testing and deployment: Red-teaming, benchmarking, gradual rollout
- Then: weights are frozen
Once a model is deployed—Claude 4.5, GPT-4o, Gemini 2.0—those weights don't change again until the next major training run. Every conversation you have uses the exact same underlying model. Your brilliant insights don't update it. Your corrections don't improve it. Your team's patterns don't get baked in.
The Implications Nobody Talks About
When you really internalize this, it changes everything:
The Clean-Slate Reality
What You Might Think
- • "I've trained this AI on our coding standards"
- • "It knows our architecture now"
- • "We've built up shared context over weeks"
- • "The more we use it, the better it understands us"
What's Actually True
- • Every new session starts from zero (from model's perspective)
- • All that "training" is gone when you close the tab
- • Your codebase patterns are invisible unless you provide them each time
- • The model tomorrow is identical to the model today
Your brilliant prompt from last month that finally got the AI to understand your architecture? The model doesn't remember it. Your team's conventions that you carefully explained? Forgotten. The edge cases you taught it to watch for? Never learned.
This is why you keep re-explaining the same things. This is why new team members can't benefit from the "training" you've done. This is why your AI productivity feels stuck at an individual level and doesn't compound across your team.
But What About Fine-Tuning and RAG?
At this point, some readers are probably thinking: "Wait, I know about fine-tuning and RAG (Retrieval-Augmented Generation). Don't those solve this problem?"
Short answer: Not really. Not in the way you need.
Let's look at both, because understanding their limitations is crucial.
Fine-Tuning: The Heavy Hammer
Fine-tuning means taking a pre-trained model and continuing to train it on your specific data, actually modifying the model's weights. This sounds like exactly what we want—teaching the model about your domain.
Here's the reality for most teams:
- ✗Expensive: Compute costs can run thousands to tens of thousands of dollars per training run
- ✗Slow: Even small fine-tuning runs take days; full retraining takes weeks
- ✗Requires ML expertise: You need to curate training data, choose hyperparameters, evaluate results
- ✗Weights freeze again after fine-tuning: You're back to the same problem—the model is static once deployed
- ✗Doesn't capture what you learned: It bakes in training data, not the insights from your last sprint
RAG: Retrieval at Inference Time
RAG (Retrieval-Augmented Generation) works differently. Instead of modifying the model, RAG pulls relevant external documents into the context window at query time. Think of it as giving the model a library card: it can look up information when it needs it, but it doesn't memorize the books.
RAG is genuinely useful for certain problems:
- ✓Dynamic knowledge: Update your documents, and the AI immediately has access to new information
- ✓Large document collections: Can search across thousands of pages to find relevant context
- ✓Real-time updates: No retraining needed when information changes
- ✓Lower cost: Storage and retrieval cheaper than retraining
But RAG still isn't organizational learning:
- ✗It's plumbing, not process: RAG tells you what was written down, not what you learned
- ✗No feedback loop: Doesn't capture insights from failed experiments or edge cases discovered in production
- ✗Static documents: Only as good as what you've written and how well you've indexed it
| Aspect | RAG (Retrieval-Augmented Generation) | Fine-Tuning |
|---|---|---|
| Knowledge location | External: databases, documents, real-time data | Internal: frozen in model weights from training |
| Update frequency | Can be updated instantly | Requires retraining cycle (weeks/months) |
| Organizational learning? | Better for dynamic, evolving knowledge—but only captures documents | Better for stable domain behavior—but freezes knowledge at training time |
| Resource requirements | Lower compute, higher storage/retrieval cost | Higher compute for training, lower inference cost |
| Captures "what went wrong"? | Only if you manually write it down | Only if it was in the training data |
The Gap Neither Fills
Both fine-tuning and RAG are technical solutions to technical problems. They help you inject knowledge into the AI's context—either by modifying weights or by pulling in documents. That's valuable.
But neither captures the kind of learning that makes teams compound:
- →What went wrong in the last sprint and why
- →Why we stopped using library X and switched to Y
- →The edge case that bit us three times until Tim figured out the pattern
- →The mental model your security expert has about your threat surface
- →The anti-patterns you've learned to avoid through painful experience
This is organizational learning—the accumulated wisdom that makes a team more than the sum of its individuals. And this is what compounds over time to create genuine competitive advantage.
Fine-tuning and RAG address technical problems (how to inject domain knowledge). Organizational learning is a human + process problem: how to systematically capture, refine, and apply what your team learns.
The Psychological Journey
Once you understand that AI models can't learn from your conversations, you go through a predictable psychological progression. Recognizing these stages helps you understand where you are—and where you need to go.
Stage 1: Blame the AI
This is where everyone starts.
Your first experiences with AI-generated code are probably disappointing:
- ✗"This is wrong"
- ✗"It doesn't understand what I want"
- ✗"AI is overhyped—look at this nonsense"
- ✗"It can't handle anything complex"
This is the natural first reaction: blame the tool. The AI is stupid, limited, not ready for real work.
Most people get stuck here. They conclude AI isn't useful for their work and stop experimenting. Or they use it only for the most trivial tasks, missing the real potential.
"When you first start to prompt AI, you write a prompt, it says some crap, so you first blame the AI. Oh, that's stupid. Look how silly the AI is."
Stage 2: Blame the Prompt
If you persist, you eventually have a breakthrough: output quality correlates with input quality.
You realize the AI isn't fundamentally broken—you just weren't giving it what it needed. So you shift to a prompt engineering mindset:
- →"Let me phrase this more clearly..."
- →"I'll add more context about what we're trying to do..."
- →"Maybe if I give it examples of good vs bad..."
- →"I need to specify the constraints more carefully..."
And it works! Outputs improve. Sometimes dramatically. This validates the approach and you invest more in learning how to prompt well.
But limits emerge:
The Context Engineering Wall
You can't just context-engineer every problem to death. At some point, the overhead of crafting the perfect prompt for every task becomes its own bottleneck. You're spending more time on prompts than you're saving on implementation.
The Team Scalability Problem
Your individual prompt skills don't automatically transfer to your team. Each person is crafting their own prompts from scratch. Knowledge doesn't accumulate. You're stuck at individual productivity.
The Repetition Trap
You keep re-explaining the same architectural constraints, the same security requirements, the same conventions—because the model doesn't remember. Every session starts from zero.
"You can't just sort of go on context engineering forever. It doesn't work. Then, because you come against your next problem, and what are you going to do is just context engineer that to death as well. So it's not really learning."
Stage 3: See the Scaffolding
This is the breakthrough that changes everything.
The insight:
The problem isn't the prompt. It's what's around the prompt.
Individual prompts are ephemeral—you craft them, use them, and they're gone. But scaffolding persists. The system that produces good prompts is the real asset.
What is this scaffolding? It's the knowledge infrastructure around the model:
- ✓Persistent context files that document your architecture, conventions, and constraints
- ✓Design documents that capture intent, not just implementation
- ✓Learnings files that record what went wrong and why
- ✓Team-shared knowledge canons that make specialist expertise available to everyone
These artifacts survive across sessions. They compound over time. They scale across team members. They make organizational learning possible even when the model itself can't learn.
This is where the shift from "how do I prompt better" becomes "how do we learn as a team." And this is where real compounding begins.
The Team Learning Gap
Here's the uncomfortable truth: individual productivity with AI doesn't automatically create team capability.
You can have a team where every single developer is using Claude Code or Cursor, getting individual productivity gains, and still have zero team-level compounding. In fact, this is the norm.
Individual Productivity ≠ Team Capability
The problem manifests in predictable ways:
-
•
Different prompting styles across the team
Alice gives the AI detailed context and gets great results. Bob uses minimal prompts and gets mediocre outputs. Neither knows what the other is doing.
-
•
Quality varies wildly by developer
Code reviews reveal huge inconsistencies in AI-generated code—because there's no shared understanding of how to guide the AI.
-
•
No mechanism for sharing what works
Someone figures out a brilliant pattern for using AI with your architecture—but it stays in their head. The team doesn't benefit.
-
•
Everyone reinvents the wheel
Each developer crafts prompts for the same architectural constraints, the same security requirements, the same error handling patterns—over and over.
The "10x developer with AI" doesn't automatically create a "10x team." Without infrastructure for sharing and compounding, individual gains stay individual.
The Coordination Tax
Traditional organizations have always struggled with knowledge transfer:
How Organizations Try to Share Knowledge
Meetings
Weekly architecture reviews, design discussions, knowledge-sharing sessions.
Problem: Slow, poorly retained, scale badly. More people = exponentially more coordination overhead.
Documentation
Confluence pages, wikis, architecture docs.
Problem: Written but rarely read. Quickly becomes stale. No one trusts it. Doesn't capture tacit knowledge or recent learnings.
Tribal Knowledge
"Ask Sarah, she knows how auth works" or "Tim has the security context."
Problem: Doesn't scale beyond ~10 people. Walks out the door when people leave. Creates bottlenecks around key individuals.
AI doesn't solve coordination problems—it amplifies individual capability. This means:
Fast individuals + slow organizational learning = everyone uses AI, but the team doesn't get better.
The Speed Mismatch
This is perhaps the most under-appreciated problem in AI adoption.
Individuals execute learning loops in minutes to hours:
- Ask AI a question
- Get an answer
- Critique it
- Refine the prompt
- Get better output
- Internalize what worked
Organizations execute learning loops in weeks to months:
- Someone discovers a problem
- Discuss in meetings
- Write up proposal
- Get approval
- Document new standard
- Train team
- Monitor compliance
That's a 100-1,000× difference in iteration speed.
AI makes individuals even faster—but organizational learning is still constrained by meetings, approval cycles, documentation processes, and training overhead. The gap widens.
"For more than a century, economies of scale made the corporation an ideal engine of business. But now, a flurry of important new technologies, accelerated by artificial intelligence (AI), is turning economies of scale inside out."— MIT Sloan Management Review, "The End of Scale"
The critical question: Can we make organizational learning run at individual speed?
This is what the rest of this book shows you how to do.
Why This Changes Everything
Once you internalize that AI models are frozen—that they can't learn from your conversations—three major implications cascade out. Each one fundamentally changes how you should think about building with AI.
Implication 1: Tool Strategy
If the model can't learn, you need infrastructure that can.
Most organizations approach AI as a purchasing decision:
- →"Should we buy GitHub Copilot or Cursor?"
- →"Which model is better, Claude or GPT-4?"
- →"Let's get everyone AI licenses"
These are reasonable questions, but they miss the point. Buying better AI tools is not the same as building team capability.
The asset isn't access to AI—it's the system around it:
- ✓Knowledge canons that persist across sessions
- ✓Design documents that capture team understanding
- ✓Learning extraction processes that compound insights
- ✓Workflows that make specialist knowledge available to everyone
You can swap out tools—Claude to GPT to Gemini—and this infrastructure still works. But without it, even the best tools deliver only individual productivity, not compounding team capability.
Implication 2: Team Process
"Everyone use AI more" is not a strategy.
You need explicit mechanisms for:
Capturing what works
When someone figures out a brilliant pattern—how do you make it available to the rest of the team?
Sharing across the team
How does a junior developer benefit from the scaffolding a senior has built? How does knowledge flow from personal → team → org?
Compounding over time
How do insights from this sprint improve how you work next sprint? How does today's learning make tomorrow easier?
This is the scaffolding this book teaches you to build. Not "better prompts" (though that helps). Not "which tool to buy" (though tools matter). But the organizational learning infrastructure that turns individual AI productivity into team-level compounding.
Implication 3: Career Positioning
Here's a career insight most people are missing:
Individual AI skill is rapidly becoming table stakes. Within 12-24 months, "can use AI to code faster" will be as unremarkable as "can use Stack Overflow" or "knows Git."
The differentiator won't be "I'm good with AI." It will be "I can build team-level capability with AI."
The valuable skillset:
- •Can architect knowledge systems that persist beyond individual sessions
- •Can design workflows that extract and compound learning
- •Can make specialist expertise scale through scaffolding
- •Can build systems that get better with each model upgrade
This is what it means to be an "AI-native architect"—someone who understands not just how to use AI tools, but how to build organizational infrastructure that compounds capability over time.
The Question This Book Answers
We've established the core problem: AI models don't learn from your conversations. The weights are frozen. Your brilliant insights don't update the model. Your team's patterns aren't baked in. Every session starts from zero.
This leads most people to ask the wrong question:
"How do I use AI better?"
That question keeps you stuck at individual productivity. You get better at prompts. You learn the tools. You speed up your own work. But it doesn't compound across your team, and it doesn't survive model upgrades.
The right question—the one this book answers—is:
"How does my team learn when the AI can't?"
This reframing changes everything. It shifts focus from the tool to the infrastructure around the tool. From individual skill to team capability. From session-level productivity to compounding organizational learning.
Preview of the Answer
Here's the core insight that the rest of this book unpacks:
The learning happens in the scaffolding—the artifacts and processes around the model, not in the model itself.
Specifically:
- ✓Versioned knowledge files (markdown canons) that persist across sessions and team members
- ✓Design documents as primary artifacts (not code)
- ✓Learning extraction rituals built into your Definition of Done
- ✓Three-layer knowledge hierarchy (personal → team → org) that compounds over time
This infrastructure makes it possible to:
- →Share specialist knowledge without meetings
- →Onboard new developers at speed
- →Get multiplicative gains from model upgrades
- →Build team capability that compounds quarterly
By the end of this book, you'll have a concrete blueprint for building this infrastructure. You'll understand not just what to build, but why it works and how to implement it starting this week.
"If the model is frozen, something else must learn. That something is your team—but only if you build the infrastructure to make it happen."
What's Next
In Chapter 2, we'll answer the question: "If learning doesn't happen in the model, where does it happen?"
You'll learn about the scaffolding hypothesis—the core insight that your .md files are acting like "soft weights" sitting on top of the model, and how this creates a learning system that compounds across team members and model generations.
Chapter Summary
- • AI models are frozen between expensive 6-month training runs—they don't learn from your conversations, no matter how much you use them.
- • Fine-tuning and RAG solve technical problems, not organizational learning—neither captures the insights, edge cases, and mental models your team develops over time.
- • The psychological journey has three stages: blame the AI → blame the prompt → realize the problem (and solution) is the scaffolding around the model.
- • Individual productivity doesn't create team capability—without infrastructure for sharing and compounding, everyone reinvents the wheel and organizational learning stays slow.
- • The right question is "How does my team learn when the AI can't?"—and the answer is building versioned knowledge infrastructure that persists, compounds, and scales.
Where Learning Actually Lives
If the model is frozen, where does organizational learning actually happen? The answer isn't in the GPU—it's in your repository, waiting to be structured.
TL;DR
- • Learning happens in the scaffolding around the model—artifacts, processes, markdown files—not in the frozen model weights
- • Your .md files act as "soft weights"—text-based conditioning that gives you fine-tuning benefits without the cost or opacity
- • Two optimization loops interlock: AI labs improve models; you improve scaffolding—together creating multiplicative returns
Chapter 1 left us with a question: if the model is frozen between training runs, where does organizational learning actually happen?
The answer changes everything about how teams should approach AI. It's not about buying better tools or writing better individual prompts. It's about building what I call the Org Brain—a learning system that lives in your repository and compounds value over time.
The Scaffolding Hypothesis
Think of your AI system as having two distinct parts:
Part 1: The Model
Characteristics: Frozen, powerful, general-purpose
Who controls it: AI labs (Anthropic, OpenAI, Google)
Update frequency: Every 6-12 months
Your influence: None (you can only choose which model to use)
Part 2: The Scaffolding
Characteristics: Adaptive, specific to your team, constantly evolving
Who controls it: You and your team
Update frequency: Minutes to hours
Your influence: Total (you build and maintain it)
Most attention goes to Part 1—model capabilities, benchmarks, new releases. But the real leverage is in Part 2. And almost nobody builds it deliberately.
The model is static between releases. You and your team are the adaptive part. When I say "the gradient updates are happening in your repo, not in the GPU," I mean it literally. Your team performs the equivalent of gradient descent—updating knowledge based on feedback—but you do it in text files instead of neural network weights.
"The gradient updates are happening in your repo, not in the GPU."
What this means in practice: every time you notice a failure pattern, you update a file. Every time something works well, you document it. The text files accumulate learning while the model stays frozen. This is organizational learning made concrete and executable.
Markdown Files as "Soft Weights"
Here's the powerful analogy that makes this approach click: your markdown files act as "soft weights" sitting on top of the model.
In machine learning, model weights are numerical values learned during training. They determine how the model behaves. But there's another way to condition model behavior: provide carefully structured text context with every interaction.
Both approaches—model weights and text context—affect AI output. Both accumulate knowledge. But soft weights (text files) have decisive advantages:
| Aspect | Model Weights | Soft Weights (.md files) |
|---|---|---|
| Who updates | AI lab | Your team |
| Update cycle | Months | Minutes |
| Readable by humans | No | Yes |
| Version controlled | No | Yes (Git) |
| Team-specific | No | Yes |
| Cost to update | Millions | Free |
Bottom line: Soft weights give you fine-tuning benefits without the cost, delay, or opacity.
When you include a well-structured markdown file with your prompt, you're conditioning the model. The file sets expectations, provides examples, establishes constraints, and supplies domain knowledge. Every session that includes the file benefits from every refinement you've ever made to it.
Consider a simple example: a file that says "We use AWS, not GCP" saves the same correction hundreds of times. That's the power of persistence. Prompts are ephemeral—one session, then gone. Files are persistent—across sessions, across team members, across model upgrades.
Why This is Actually Better
Here's the counterintuitive truth: it's actually better that AI can't learn from your conversations. If models updated their weights from every chat, you'd face serious problems.
If AI Could Learn From Conversations
❌ The Problems You'd Face
- • Would it learn the right things—or your bad habits too?
- • How would you audit what it learned?
- • Could you undo something it learned incorrectly?
- • Would it learn contradictory lessons from different team members?
- • Who would be responsible when it applies "learned" behavior incorrectly?
Outcome: Opaque, uncontrollable, potentially dangerous accumulation of behaviors
✓ With Soft Weights (Text Files)
- • You control exactly what's in the files
- • Git history shows every change and who made it
- • PR process for team files = review before "learning"
- • Wrong thing in a file? Delete it immediately
- • Conflicts are visible and resolved through normal code review
Outcome: Transparent, controllable, auditable knowledge accumulation
The Inspectability Advantage
Model weights are a black box. You can't open them up and see what the model "knows." But text files? Completely transparent. When AI output is wrong, you can debug:
- • "What was in the context for this session?"
- • "Is our coding.md missing this pattern?"
- • "Did the design doc fail to specify this constraint?"
Debugging becomes possible. Improvement becomes systematic. Knowledge becomes a first-class artifact you can inspect, diff, and refine.
The Ownership Advantage
A fine-tuned model is still owned by the AI lab. Your markdown files? Owned by you. This ownership is portable across:
Portability of Knowledge Canon
Model Providers
- • Claude → GPT → Gemini
- • Same files work everywhere
- • No vendor lock-in
Development Tools
- • Cursor → Claude Code → Copilot
- • Tool-agnostic knowledge
- • Survive tool churn
Team Members
- • New hires inherit the canon
- • Knowledge doesn't leave with people
- • Onboarding becomes instant
The Learning Loop in Practice
Now that we understand what the scaffolding is and why it's better than model learning, let's see how it works. The learning loop is simple but powerful:
The Five-Step Learning Loop
Use AI
Include current scaffolding (markdown files, design docs) as context in your AI session
Observe Output
Critically evaluate what worked well and what didn't; note surprises and failures
Identify Pattern
Is this a one-off edge case or a recurring issue? Will this pattern help future work?
Update Scaffolding
Add to the appropriate file (personal, team, or org level)—this is where learning accumulates
Repeat
Next use automatically benefits from the update; the loop continues with improved scaffolding
This is organizational learning made concrete and executable. It's not a vague aspiration—it's a workflow you can implement today.
Three Speeds, One Mechanism
The loop operates at three different speeds depending on the scope:
Individual Loop (Fast)
Speed: Minutes
Files affected: Personal learned.md only
Benefits: You get better quickly; low friction experimentation
Team Loop (Moderate)
Speed: Hours to days (needs PR/review)
Files affected: Shared team files (coding.md, infrastructure.md)
Benefits: Everyone gets better; knowledge cross-pollinates automatically
Org Loop (Stable)
Speed: Quarterly updates typical
Files affected: Enterprise-wide canon (security.md, architecture.md)
Benefits: New teams start ahead; company-wide consistency
Three speeds, three scopes, one mechanism. The architecture is fractal: the same pattern works at individual, team, and organizational scale.
The Accumulation Effect
Each loop iteration deposits what I call a "fossil" in the org brain. Over weeks and months, substantial knowledge accumulates. Unlike meeting notes or wiki pages that rot from disuse, these files have a built-in incentive for maintenance: they're actively used every day. When a file goes stale, bad AI outputs immediately reveal it. The feedback loop is tight and automatic.
From Individual Productivity to Team Capability
Chapter 1 introduced the speed mismatch: individuals learn in minutes, organizations learn in months—a 100x to 1,000x difference. The scaffolding approach collapses that gap.
Individual learnings are captured in personal files immediately (fast loop). When validated through repeated use, they're promoted to team files (moderate speed). When proven across multiple teams, they're promoted to org-level canon (appropriate slowness for enterprise standards). Organizational learning can now happen at near-individual speed.
The Compounding Effect
Here's where it gets exponential. Each model release multiplies the value of your scaffolding. Without scaffolding, a new model means starting from scratch—everyone learns its quirks independently. With scaffolding, a new model becomes a smarter executor of accumulated team knowledge.
It's like swapping a better engine into a car you've already tuned. When Claude Opus 4.5 released, I had months of scaffolding in place—refined design documents, well-maintained canon files, established patterns. My output quality jumped immediately, not just because the model was better, but because the better model was executing against better knowledge. That's multiplicative improvement.
"You're doing SGD on your workflows, your canon, and your taste. Two optimization loops, interlocking."
SGD—Stochastic Gradient Descent—is how neural networks learn. They don't improve all at once; they make thousands of tiny adjustments, each one nudging the model slightly closer to useful output. The metaphor here: every time you refine a prompt template, update a design document, or correct AI output that missed the mark, you're doing the same thing to your scaffolding. Small iterations, compounding. And you're running two loops simultaneously—you improving the scaffolding, and new model releases improving the AI executing against it. Neither loop alone is transformative. Together, they compound.
The Team Knowledge Transfer Problem (Solved)
Traditional teams face a persistent problem: critical knowledge lives in people's heads. It leaves when they leave. It's hard to transfer consistently. New hires spend months absorbing tribal knowledge through osmosis and informal conversations.
The scaffolding approach inverts this. Knowledge is externalized in files, version-controlled in Git, and inherited automatically by new team members. When you onboard someone, you give them the repository. They immediately have access to every pattern, every learned constraint, every architectural decision. The AI "reads" this knowledge every session, so new developers benefit from senior expertise without asking.
What the Scaffolding Contains
We'll explore the detailed architecture in Chapter 3, but here's a preview of the three-layer structure:
Preview: The Three-Layer Knowledge Canon
Layer 1: Personal (learned.md)
Your individual failure patterns, preferences, and current context
Example entries: "Current date is late 2025", "Claude 4.5 is current model", "I prefer concise explanations over verbose ones"
Layer 2: Team (coding.md, infrastructure.md)
Shared conventions, patterns, and "we use X not Y" rules
Example entries: "We use AWS for all infrastructure", "We prefer functional components in React", "Never store PII in logs"
Layer 3: Org (security.md, architecture.md)
Enterprise-wide guardrails, compliance requirements, architectural principles
Example entries: "All services must implement request tracing", "Data residency requirements for EU customers", "OAuth 2.0 for all authentication"
Each layer serves a different purpose at a different speed. Personal files enable fast individual learning. Team files enable knowledge sharing without meetings. Org files provide enterprise consistency without bureaucracy.
What Makes Good Scaffolding
Not all knowledge files are created equal. Effective scaffolding has these characteristics:
- Specific: Concrete examples, not vague principles ("Use Postgres for relational data" not "Choose appropriate databases")
- Actionable: Can be applied directly to AI prompts without interpretation
- Maintained: Updated when reality changes; staleness is visible through bad outputs
- Used: Actually included in AI sessions; if it's not read, it's not scaffolding
The key test: does including this file in context improve AI output quality? If not, it's documentation theater.
Design Documents as Scaffolding
Design documents are a special and crucial form of scaffolding. They capture intent, constraints, and architecture for specific features or systems. They become the "source of truth" for code generation—and they accumulate learnings about what works in your specific context.
We'll explore this deeply in Chapters 4 and 5, but understand now: design documents aren't just planning artifacts. They're the primary output of knowledge work in an AI-native team.
The Meta-Learning Pattern
Here's a subtle but important point: the scaffolding approach is itself learnable. As you practice building and maintaining knowledge files, you develop a meta-skill—learning how to build systems that learn.
You get better at identifying what belongs in files. You get better at structuring knowledge for AI consumption. You get better at the promotion and refactoring process—knowing when to move a learning from personal to team to org level.
This compounds too. The better you get at building scaffolding, the faster your scaffolding improves. And when better scaffolding meets better models, you get multiplicative returns.
Two Optimization Loops, Interlocking
There are two optimization loops running simultaneously:
The Dual Optimization Engine
Loop 1: AI Labs (You Don't Control)
Anthropic, OpenAI, and Google continuously improve base models through massive training runs. New capabilities, better reasoning, improved accuracy.
Loop 2: Your Team (You Do Control)
Your team continuously improves scaffolding through daily practice. Better knowledge files, refined design documents, accumulated patterns.
The Interaction (This Is Where Magic Happens)
- • Better model + same scaffolding = better output
- • Same model + better scaffolding = better output
- • Better model + better scaffolding = multiplicative improvement
The AI lab does stochastic gradient descent on model weights. You do the equivalent on your workflows, your canon, and your taste. Two optimization processes, running in parallel, creating compound returns.
What This Means for the Rest of the Book
We've established the core insight: learning happens in the scaffolding, not the model. This insight drives everything that follows.
The next chapters answer the natural follow-up questions:
The Architecture Question
How should knowledge be organized? → Chapter 3: The Three-Layer Canon
The Artifact Question
What should be the primary output of development work? → Chapter 4: Design Documents as Gospel
The Process Question
How do we make learning extraction systematic? → Chapter 6: Definition of Done v2.0
The Compounding Question
How do we maximize the flywheel effect? → Chapter 9: Model Upgrade Flywheel
Chapter Summary
- • Learning happens in the scaffolding around the model—artifacts, files, processes—not in frozen model weights
- • Markdown files act as "soft weights"—text-based conditioning that shapes AI behavior like fine-tuning, but with full control
- • This is better than AI learning from conversations: inspectable, controllable, owned by you, portable across tools
- • The learning loop: use AI → observe → identify patterns → update scaffolding → repeat
- • Three speeds (individual/team/org) enable fast personal learning and stable organizational knowledge
- • Two optimization loops interlock: AI labs improve models; you improve scaffolding; together = multiplicative returns
- • The org brain is an architecture you build deliberately, not a tool you purchase
What's Next
We've established that learning lives in the scaffolding. But how should that scaffolding be organized?
Chapter 3 reveals the three-layer architecture that makes knowledge accumulation practical, scalable, and sustainable—from your personal learned.md to enterprise-wide security.md.
The Three-Layer Canon
How to organize team knowledge so individuals can iterate fast while the organization builds stable, compounding capability.
Your team's AI knowledge is probably scattered across Slack threads nobody can find, Notion pages that rot, that one person's head who's on holiday, and individual prompt libraries nobody shares.
This isn't a documentation problem. It's an architecture problem.
Most teams oscillate between two failure modes: either everything is personal and chaotic (fast but inconsistent), or everything requires committee review (consistent but glacial). You need a structure that enables both fast individual iteration and stable organizational learning.
The answer is a three-layer knowledge canon.
What Is a Canon?
The term comes from literature: the accepted, authoritative body of work in a field. In software teams, the canon is the accepted, authoritative body of knowledge that AI reads to understand how your team thinks and builds.
This isn't documentation (which implies an afterthought). It's active infrastructure—the scaffolding that conditions every AI interaction.
Three layers. Three speeds. One compounding system.
Why Three Layers (Not One, Not Ten)
The three-layer structure optimizes a specific trade-off: fast individual iteration versus stable team foundation.
One layer can't serve both needs. A single shared knowledge file becomes either:
- Too loose — Everyone edits freely, quality degrades, nobody trusts it
- Too rigid — Every change needs approval, updates stall, people work around it
Three layers solve this by matching update velocity to scope:
Layer 1: Personal
Update cycle: Minutes to hours
Review rigor: None—your experimental sandbox
Scope: Just you
Example: learned.md in your home directory
Layer 2: Team
Update cycle: Days to weeks (ticket-driven)
Review rigor: Peer review via PRs
Scope: Your team (5-20 people typically)
Example: coding.md, infrastructure.md in team repo
Layer 3: Organization
Update cycle: Quarterly, or on major org changes
Review rigor: Senior architect / security review
Scope: Enterprise-wide
Example: security.md, architecture.md in org canon repo
Knowledge flows upward through promotion: insights bubble up from personal → team → org as they're validated. Constraints flow downward through inheritance: org-wide principles cascade to every team, every individual.
This mirrors how good organizations actually work: innovation at the edges, standards from the center.
Layer 1: Personal Canon (learned.md)
Your personal canon is where fast learning happens. It's messy, experimental, and updated the moment you notice something worth remembering.
What goes in your personal learned.md:
Current Context
Information that keeps AI grounded in reality:
- • Current date (use "late 2025" so it stays valid for months)
- • Model versions you're using ("Claude Opus 4.5," "GPT-5.1")
- • Current project focus
Why: AI doesn't know what happened after its training cutoff. Give it temporal grounding.
Failure Patterns
Things AI keeps getting wrong for you specifically:
- • "AI forgets to validate input before processing"
- • "AI overcomplicates simple solutions—prefer the direct path"
- • "AI uses deprecated libraries unless explicitly told otherwise"
Rule: Any correction you make twice goes into learned.md.
Personal Preferences
How you like to work and think:
- • "I prefer functional style over OOP"
- • "Always show reasoning before code"
- • "Use TypeScript strict mode by default"
Impact: AI adapts its outputs to match your style automatically.
The "Messy on Purpose" Principle: Personal canon should be low-friction. Better to capture something imperfectly than not capture it at all. It's experimental ground—not every entry will be correct. Refinement happens through use; wrong things become obvious when they produce bad outputs.
Permission to be messy = actually gets used.
Layer 2: Team Canon (coding.md, infrastructure.md)
Team canon is where individual insights become shared conventions. This is knowledge that applies to your whole team, maintained through the same PR process you use for code.
What goes in team canon files:
Conventions and Standards
The "we use X, not Y" decisions:
- • "We use AWS, not GCP"
- • "Our API handlers follow this pattern: [example]"
- • "Error responses always include
request_id"
Impact: AI generates code that matches your team's conventions automatically.
Patterns and Anti-Patterns
What works and what to avoid:
- • "We prefer the repository pattern for data access"
- • "Never import from internal modules directly—use the facade"
- • "Don't write raw SQL outside the repository layer"
Why this matters: New team members (and their AI) inherit battle-tested patterns immediately.
Shared Context
Information that doesn't fit in code but shapes decisions:
- • "This codebase was migrated from Python 2 in 2022—legacy naming exists"
- • "The 'customer' table is actually called 'clients' for historical reasons"
- • "We're gradually moving from REST to GraphQL—prefer GraphQL for new endpoints"
Result: AI understands your context and makes appropriate suggestions.
Team Canon as "Tim's Brain"
Here's where team canon becomes powerful: specialist knowledge scales without meetings.
Tim is your security person. Normally, Tim's knowledge lives in his head. When a junior developer makes a security mistake, Tim catches it in review. When Tim's on holiday, security mistakes slip through. When Tim leaves the company, his knowledge walks out the door.
With team canon, Tim writes his thinking into security.md:
## PII Handling
- Never log fields: email, phone, address, SSN
- All PII must be encrypted at rest (use encryption service)
- PII requests require audit trail (use audit_log table)
## Authentication
- All external calls go through API gateway
- Auth tokens expire after 24 hours
- Refresh tokens rotate on every use
## Input Validation
- Validate at API boundary, not in business logic
- Use zod schemas for all inputs
- Never trust client-side validation alone
Now every developer's AI session reads security.md. Tim's thinking is applied automatically. The new hire's AI already knows not to log email addresses. Tim's on holiday? His brain is still here, in the canon.
"Tim writes a whole bunch of stuff into the scaffolding. Now everyone uses it, and you didn't have to sit through the meeting and ignore it."
This pattern applies to any specialist:
- Performance expert →
performance.md(caching strategies, query optimization patterns) - Ops team →
infrastructure.md(deployment requirements, monitoring standards) - Domain expert →
billing.md(business rules, edge cases, regulatory constraints)
Specialists scale their expertise through documentation. The canon speaks when they can't.
The PR Process for Team Canon
Team canon changes go through pull requests, just like code:
- Developer notices a pattern worth capturing
- They open a PR adding it to the appropriate file
- Team reviews (catches errors, suggests refinements)
- Merge → the knowledge becomes canonical
Why this matters:
- Review catches errors before they propagate to everyone's AI
- History is preserved (git log shows evolution)
- Attribution is clear (git blame shows who added what)
We'll cover the PR template and triage logic in Chapter 7. For now, understand: team canon is code-like—versioned, reviewed, deliberate.
Layer 3: Organization Canon (security.md, architecture.md)
Org canon is the enterprise-wide knowledge that all teams inherit on day one. It's conservative, stable, and updated only when something genuinely applies everywhere.
What goes in org canon:
Architectural Principles
- • "All services expose OpenTelemetry traces"
- • "We use event sourcing for domain state"
- • "Microservices own their data—no shared databases"
Security Requirements
- • "Never log PII fields (email, phone, address, SSN)"
- • "All external API calls go through API gateway"
- • "Auth tokens expire after 24 hours maximum"
Compliance Constraints
- • "GDPR requires data residency in EU for EU customers"
- • "PCI scope includes: payment-api, billing-service, card-vault"
- • "All audit logs must be retained for 7 years"
Org-Wide Conventions
- • "All teams use semantic versioning (semver.org)"
- • "ADRs required for architectural decisions (see architecture-decisions/)"
- • "Production deployments require two approvals"
Org Canon as "Day One Inheritance"
A new team spins up. They clone the org canon repo. Their first AI session already knows:
- Security requirements (no PII in logs)
- Architectural patterns (event sourcing, service boundaries)
- Compliance constraints (GDPR, PCI scope)
No 6-month learning curve to understand "how we do things here." The org brain is transferred instantly.
The Conservation Principle
Org canon should be conservative. High bar for what gets added. Updates are deliberate, not reactive.
Ask: "Does this really apply everywhere?"
If it's team-specific, keep it at the team layer. Org canon cascades to everyone—wrong thing here becomes wrong thing everywhere.
How Knowledge Flows Between Layers
The three layers aren't isolated—they're connected by two flows.
Upward Flow: Promotion
Knowledge moves up as it's validated and proven useful.
Personal → Team
- You notice something in your
learned.mdthat bit multiple teammates - You open a PR to add it to
coding.md - Team reviews: "Yes, we've all hit this"
- Merged → now everyone benefits
Team → Org
- Team lead notices a pattern that other teams keep rediscovering
- They propose it for org canon
- Senior architects review: "Is this truly universal?"
- If yes → added to org canon, all teams inherit it
The flow is deliberate. Not everything promotes. The bar rises as you move up layers.
Downward Flow: Inheritance
Constraints cascade down automatically:
- New team member joins → inherits org + team canon immediately
- They start personal canon with clean slate (or copy a starter template)
- Org rules apply automatically through canon injection
No "but I didn't know" excuse. The canon tells you what you need to know.
The Recognition Pattern
When your insight makes it from personal → team → org, it's meaningful recognition.
"Your line became how we do things here."
This is better than gamification (badges, leaderboards, XP points). It has intrinsic value: your thinking scaled across the organization.
"The reward isn't badges and leaderboards. It's 'this thing you discovered is now how we do things here.'"
Maintaining the Canon
Why Canon Doesn't Rot Like Documentation
Traditional documentation rots because:
- Nobody uses it
- No feedback loop from reality
- Staleness isn't visible until someone reads it (rarely)
Canon files are different:
- Actively used — AI reads them every session
- Immediate feedback — Bad guidance → bad output → immediately visible
- Built-in incentive — Fixing the canon improves your daily work
The usage pattern keeps it honest.
The Pruning Principle
Adding to canon is important. Removing from canon is equally important.
Signs something should be pruned:
- No longer accurate (stack changed, pattern deprecated)
- Superseded by something better
- Too specific for its layer (team detail in org canon)
Regular review cadence:
- Personal: Whenever you notice staleness
- Team: Monthly-ish (lightweight review)
- Org: Quarterly (formal review by architects)
Version Control as Memory
All canon lives in Git. This gives you:
- Full history — What was added, changed, removed
- Attribution — Who added what and when (git blame)
- Experimentation — Branches for testing changes
- Rollback — Revert to known-good states
This is documentation that acts like code. It deserves the same care.
Practical Structure
Here's a recommended file structure for teams:
/personal
learned.md (gitignored or separate repo)
/team
coding.md
infrastructure.md
testing.md
/org
security.md
architecture.md
compliance.md
learned.md (personal, gitignored)
Start simple. One shared file is better than none. Add structure as needed.
How to Use Canon Files
Always include relevant canon as context when working with AI:
- Working on code? Include
coding.md - Security-relevant change? Add
security.md - Design work? Include
architecture.md
Tools like Claude Code can auto-include from CLAUDE.md. Cursor uses .cursor/rules files. GitHub Copilot reads .github/copilot-instructions.md.
Principle: make inclusion automatic, not manual. The canon should be in every relevant AI session by default.
Common Objections
"We Already Have Confluence / Notion"
Yes, and when did anyone last read it?
Canon is different from traditional documentation:
| Aspect | Documentation (Confluence) | Canon (Markdown) |
|---|---|---|
| Primary consumer | Humans (maybe) | AI (every session) |
| Staleness detection | Manual (if someone reads it) | Automatic (bad output) |
| Maintenance incentive | Low (no direct impact) | High (affects daily work) |
| Version control | Built-in wiki versioning | Git (full history, blame, PRs) |
| Location | Separate system | Lives in repo |
Confluence is for humans to maybe read. Canon is for AI to always read. Different purposes, different dynamics.
"This Is Just Documentation"
No. Documentation is typically:
- Written after the fact (describes what exists)
- Read by humans (sometimes)
- Passive (sits there hoping to be useful)
Canon is:
- Written to influence future output (prescribes what should happen)
- Read by AI (every session)
- Active (directly shapes what gets built)
The difference is usage. Documentation sits in a wiki. Canon is injected into every AI session. That usage pattern changes everything about how it gets maintained.
"Won't This Get Out of Sync?"
Yes, if you don't use it. No, if it's part of the workflow.
The feedback loop keeps it honest:
- AI reads canon every session
- Wrong content → wrong output
- You notice immediately (the output doesn't match reality)
- You fix the canon (takes 30 seconds)
- Future sessions benefit
We'll formalize this in Chapter 6 (Definition of Done v2.0): updating canon becomes part of completing work. When the canon is actively used, staleness is visible and fixing it is incentivized.
Chapter Summary
- • Three layers optimize for both fast individual iteration and stable organizational learning
- • Personal layer (learned.md): fast, messy, experimental—your sandbox for daily learnings
- • Team layer (coding.md, security.md): shared conventions, specialist knowledge, peer-reviewed via PRs
- • Org layer (architecture.md): enterprise-wide standards, conservative updates, architect-reviewed
- • Knowledge flows upward through promotion (PR process); constraints flow downward through inheritance
- • Canon differs from documentation: it's actively used every session, staleness is immediately visible
- • Version control provides history, attribution, and the ability to experiment and roll back
What's Next
We've structured how knowledge is organized across three layers. But there's a special class of artifact that deserves its own treatment: the design document.
In Chapter 4, we'll explore a radical rethinking: what if code isn't the primary artifact anymore? What if design documents are the product, and code is just the compiled output?
Design Documents as Gospel
"Delete your code. Regenerate it tomorrow. The design document is the asset now."
This sounds radical. By the end of this chapter, it will make perfect sense.
For more than half a century, software development has centered on one artifact: code. Code is what runs. Code is what's reviewed. Code is what's tested. Code is "the work."
That's changing. Not because code doesn't matter—it still runs in production, gets tested, and drives business value. But because something fundamental shifted in the economics of software creation.
This chapter makes the most counter-intuitive claim in the book: code is ephemeral, design documents are the real asset. This isn't a prediction about the future. It's a description of what's already happening for teams that have figured this out.
The Old Mental Model
Let's start by acknowledging what was true before—and why it made sense.
Code as the Source of Truth
For decades, code was obviously the primary artifact:
- • Code is what executes—it defines system behavior
- • Code is what gets versioned in Git repositories
- • Code is what gets reviewed by senior engineers
- • Code is what tests validate against
Documentation, when it existed at all, was secondary. Written after the fact. Often stale. Describing what the code does, not what it should do.
This hierarchy made sense when writing code was the hard part. Translating design to implementation required deep expertise. Every line had to be carefully crafted. Every edge case had to be explicitly handled.
Why Documentation Failed
Traditional documentation had predictable failure patterns:
Why Documentation Was Written
- • Afterthought—when "real work" was done
- • Compliance checkbox for process
- • Onboarding aid for new team members
- • Archive of historical decisions
Why It Rotted
- • No enforcement—code could drift freely
- • Wrong audience—written for humans who rarely read it
- • Maintenance burden—separate from actual work
- • No feedback loop—staleness wasn't visible
Result: documentation became a liability, not an asset. Teams gave up on it. Knowledge stayed in people's heads, walked out the door when they left, and had to be rediscovered by the next generation.
What Changed
The shift isn't coming. It's here. Three fundamental changes rewrote the economics of software creation.
AI Made Code Generation Cheap
"AI is nearly a genius at coding, the very first go, if you've got a good design document. It's the best software developer I've seen in my whole life, in terms of correct code per minute."
Before AI, writing code was the expensive, skilled work. Translating requirements into working software required years of expertise, deep knowledge of languages and frameworks, and meticulous attention to detail.
After AI, code generation is cheap, fast, and increasingly reliable. Modern models can take a well-specified design and produce correct, idiomatic code on the first pass. Not perfect—but remarkably good.
If code is cheap to generate, what's the expensive work now?
The thinking that precedes the code. The design. The architecture. The constraints and invariants. The edge cases and failure modes. The "what" and "why" before the "how."
The Skill Shift
Old Bottleneck
Problem: Translating design to code (typing, syntax, patterns)
Required skill: Deep knowledge of language idioms, frameworks, and implementation patterns
Time sink: Converting architectural intent into thousands of lines of correct code
New Bottleneck
Problem: Knowing what to build (design, architecture, constraints)
Required skill: Problem decomposition, system thinking, constraint identification
Time sink: Specifying intent clearly enough that AI can execute correctly
The better you are at specifying what you want—the architecture, the constraints, the edge cases—the better the AI output. The fewer iterations needed. The less debugging required.
The Paradigm Shift
This isn't a subtle change. It's an inversion of how we think about software artifacts.
The Inversion
| Dimension | Old Pattern | New Pattern |
|---|---|---|
| Primary artifact | Code | Design document |
| Documentation role | Afterthought describing code | Input to generation |
| Source of truth | What the code does | What the design specifies |
| When they diverge | Update documentation to match code | Fix design, regenerate code |
| Learning captured in | Code patterns and comments | Design docs and canon files |
| Review focus | Line-by-line code review | Design review before coding |
The workflow becomes: Design Doc → AI generates code → Test → Extract learnings → Update design doc.
The design document is no longer an afterthought. It's the input. The specification. The source of truth.
What "Ephemeral Code" Actually Means
Let's be clear about what this doesn't mean: it's not "delete everything constantly" or "production code doesn't matter."
Production code still matters. It runs. It's tested. It's deployed. It serves customers.
What "ephemeral" means is: code can be regenerated from design if needed.
When Bugs Are Found
❌ Old Approach: Nurse the Code
- • Patch the bug in the code directly
- • Add band-aid fixes and edge case handlers
- • Code accumulates complexity over time
- • Design and implementation drift apart
Outcome: Technical debt compounds, system becomes harder to reason about
✓ New Approach: Fix the Design
- • Ask: what did the design doc miss?
- • Update the design to address root cause
- • Regenerate code from updated design
- • Design remains source of truth
Outcome: Design captures learning, code stays aligned with intent
Design Docs as the Durable Asset
The design document persists across changes that would normally require major rework:
- • Model upgrades — New model reads the same design doc, produces better code
- • Team changes — New member reads design to understand intent
- • Refactoring — Regenerate from updated design when patterns change
- • Bug fixes — Fix the design, regenerate the code
The intellectual capital lives in:
✓ Architecture decisions and their rationale
✓ Patterns, conventions, and anti-patterns
✓ Constraints, invariants, and business rules
✓ Edge cases, failure modes, and recovery strategies
"The real intellectual capital is: Your architecture canon. Your patterns, conventions, and anti-patterns. Your learned.md of 'never do this again.'"
Why This Is Better
This isn't just a different way of working. It's a better way of working. Here's why.
Catch Problems Earlier
Code review catches problems after code is written. Design review catches problems before code is written. The time difference is massive.
Bad Path: Code First, Review Later
Junior developer codes for 3 days → senior reviews PR → finds fundamental design flaw → 3 days of work thrown away → start over.
Cost: 3+ days wasted, demoralized developer, delayed delivery
Good Path: Design First, Review Early
Junior writes design doc → senior reviews in 30 minutes → spots flaw → design corrected → coding starts with sound foundation.
Cost: 30 minutes of review time, problem solved before implementation
Time savings: 5 minutes of design review vs 3 days of rework.
Smaller Surface Area
Reviewing code means reviewing enormous surface area. Reviewing design means reviewing distilled intent.
Code has syntax, formatting, variable names, implementation details, edge case handling, tests, error handling, logging—thousands of lines for a modest feature.
Design docs have intent, constraints, interfaces, architecture, patterns, invariants—hundreds of words for the same feature.
Reviewing design is faster, more focused, and catches bigger issues.
Better for AI Reasoning
Code is detail-first. Lots of tokens spent on syntax. Semantics scattered and implicit. AI has to infer intent from implementation.
Design docs are meaning-first. "This module does X with inputs Y." "These invariants must hold." "These are the failure modes."
Same context window, dramatically higher signal-to-noise ratio.
Enables Project-Level Understanding
The Whole-System Question
With Code
Can you fit your whole codebase in a prompt?
No (or barely, with severe degradation)
Even when technically possible, context rot makes large code dumps unreliable for reasoning.
With Design Docs
Can you fit all your design docs in a prompt?
Often yes, with clarity
Higher signal density means the AI can reason about system-level concerns.
What This Enables
- • "Which components touch user data?"
- • "Where are the security boundaries?"
- • "What depends on what?"
- • "How does authentication flow across services?"
Code-level analysis can't do this at scale. Design-level analysis can.
The Mental Shift Required
This paradigm doesn't just change process—it changes how different roles think about their work.
For Individual Developers
Stop thinking: "I write code"
Start thinking: "I design systems; AI writes code"
Your value is in:
- • Understanding the problem deeply
- • Making architectural decisions
- • Specifying constraints and invariants
- • Identifying edge cases and failure modes
Code typing is no longer the skilled work.
For Senior Engineers and Architects
Stop: Reviewing every line of code (impossible at scale)
Start: Reviewing designs before code is written
Your leverage is in:
- • Shaping design approaches across the team
- • Catching conceptual flaws early
- • Curating the canon (Chapter 3)
- • Ensuring designs follow established patterns
"I hate with a passion, reviewing code level stuff, because it's so slow. Bring me some design documents, and let's have a talk about that."
For Team Leads
Stop: Measuring velocity by lines of code shipped
Start: Measuring quality of designs and learning captured
New questions to ask:
- • "How clear is the design doc?"
- • "What did we learn from this ticket?"
- • "Was the canon updated with insights?"
- • "Can this design be understood by new team members?"
Code is output. Design and learning are the process.
For the Organization
Stop: Treating code as the primary artifact
Start: Treating knowledge assets as primary
Implications for investment and process:
- • Invest in design doc infrastructure and templates
- • Invest in canon maintenance and curation
- • Measure design quality, not just shipping velocity
- • Hire for system thinking, not just coding speed
"If you're really, really good at software, you'll find AI can easily 10x you. But you have to know what you want, how you want it written, what database to use, what interfaces to use. You have to know everything that you want."
Architecture Decision Records (ADRs)
A special class of design document deserves explicit treatment: the Architecture Decision Record.
What ADRs Are
An ADR is a document that captures a significant architectural choice. It follows a simple structure:
- Context — What situation are we in? What options exist?
- Decision — What did we decide to do?
- Consequences — What are the implications (positive and negative)?
Key characteristic: ADRs are never deleted, only superseded. They form a permanent record of architectural evolution.
Why ADRs Matter in This Model
ADRs capture the "why" behind architecture. When someone asks "why do we do it this way?" you point to the ADR. Context, decision, and consequences are documented. No need to find the person who made the decision. No need to reverse-engineer intent from code.
Benefits compound:
- • Code refactored? ADR still explains the decision
- • Person left? ADR preserves their reasoning
- • New team member? ADRs provide onboarding
- • AI generating code? ADRs provide architectural constraints
Counter-Arguments Addressed
This paradigm shift invites legitimate questions. Let's address them directly.
"But Production Code Still Matters"
Absolutely. Code runs. Tests validate. Deployments happen. The claim isn't "code doesn't matter."
The claim is: design is the source of truth; code is derived from it.
When they diverge, you fix the design and regenerate—not the reverse. Production code is important, but it's not the primary artifact that captures your thinking.
"You Can't Regenerate Complex Systems"
Fair point for legacy systems without design documentation.
The path forward:
- • New code is designed-first from day one
- • Legacy code gains design docs over time (when touched)
- • Eventually, design coverage is high enough for regeneration
The goal is the end state, not instant transformation. Gradual migration, not big-bang rewrite.
"This Adds Overhead"
Upfront: yes, writing design docs takes time.
But it's not additional time—it's shifted time.
Time Investment Comparison
Old Approach
- • Code for 2-3 days
- • Wait for review (1+ days)
- • Explain approach in PR comments
- • Address review feedback (hours to days)
- • Second review round if design flaw found
Total: 3-5+ days
Design-First Approach
- • Write design doc (2-4 hours)
- • Design review (30 minutes)
- • AI generates code (minutes)
- • Code review focused on conformance (1 hour)
- • Ship with confidence
Total: 1-2 days
Net time is often less, and problems are caught earlier when they're cheaper to fix.
"Developers Won't Write Design Docs"
Fair concern. Incentives matter.
When design docs are:
- • Required for AI to do good work (immediate value)
- • Part of Definition of Done (process requirement)
- • The thing that gets reviewed, not code (shifts attention)
- • The thing that captures your thinking (intellectual ownership)
Then developers write them because they're useful, not bureaucratic overhead.
What This Means for Code Review
Code review doesn't disappear. It transforms.
The Split
| Review Type | When | Focus | Who |
|---|---|---|---|
| Design Review (Primary) | Before coding | Architecture, constraints, approach | Architects, seniors |
| Code Review (Secondary) | After implementation | Conformance to design, implementation details | Tools + peers |
AI-Assisted Code Review
Tools can now scan code for:
- • Type consistency and null safety
- • Error handling patterns
- • Style violations and formatting
- • Security issues (SQL injection, XSS, etc.)
The human focus shifts to the higher-leverage question: does code match design?
Conformance Checking
New concept in the design-first workflow: automated design-code conformance checking.
Questions AI can answer:
- • "Design says we validate input—is there validation?"
- • "Design says we log errors—is there logging?"
- • "Design says we never store PII field X—does code store it?"
- • "Design specifies these interfaces—are they implemented?"
This becomes part of Definition of Done (Chapter 6). It frees humans from line-by-line review to focus on conceptual coherence.
Key Takeaways
- • AI made code generation cheap; the expensive work is now design, architecture, and constraint specification
- • Code is ephemeral (regenerable); design documents are the durable intellectual asset
- • Design docs have smaller surface area, higher signal density, and work better with AI reasoning
- • Catch problems in design review (minutes) not code review (days later, after implementation)
- • Architecture Decision Records (ADRs) capture the "why" behind decisions permanently
- • Code review splits into design review (primary, before coding) and conformance checking (secondary, automated)
- • The mental shift required: "I design systems" not "I write code"
What's Next
We've established that design documents are the asset. But what does a design-first workflow actually look like in practice?
Chapter 5 presents the Design-Compiler Pattern: a concrete, step-by-step workflow you can adopt to turn this paradigm shift into daily practice.
Chapter 5: The Design-Compiler Pattern
Chapter 4 made the case that design documents are the asset. Now let's get practical: what does a design-first workflow actually look like when you're sitting down to implement a feature at 9 AM on Monday?
The answer is a repeatable seven-step process we call the Design-Compiler Pattern. The metaphor is deliberate: a compiler takes high-level source code and produces machine instructions. In this workflow, AI takes design documents and produces implementation code. The design doc is your "source code"; the generated code is the "binary."
This chapter provides the concrete mechanics. By the end, you'll have a workflow you can implement tomorrow—one that shifts your team from "vibe coding" to systematic, design-driven AI collaboration.
The Seven-Step Workflow
Before diving into each step, here's the complete workflow at a glance:
The Design-Compiler Workflow
- 1. Gather Context — Collect requirements, screenshots, constraints
- 2. Assemble Canon — Pull in relevant knowledge files (coding.md, security.md)
- 3. Enter Plan Mode — AI designs, does not code
- 4. Produce Design Document — AI outputs architecture, not implementation
- 5. Review Design — Human review before any code is written
- 6. Implement from Design — AI generates code from approved design
- 7. Run Conformance Check — Verify code matches design
Why These Steps in This Order
The workflow divides cleanly into three phases:
Steps 1–4: Thinking
Preparation and design. This is where the intellectual work happens—understanding the problem, gathering context, producing a clear plan.
Step 5: Quality Gate
The design review checkpoint. Catch conceptual problems before any code is written. This is where senior judgment matters most.
Steps 6–7: Execution
Implementation and validation. If the design is good, these steps should be quick and clean—almost mechanical.
The key insight: most of your time should be in steps 1–5. If you've done the thinking work well, the coding work becomes straightforward. This inverts the traditional pattern where developers spend 70% of their time coding and 10% planning. In the Design-Compiler Pattern, you spend 50% on design and 30% on implementation.
"If you've got a good design document, AI is nearly a genius at coding the very first go. It's the best software developer I've seen in my whole life, in terms of correct code per minute."
The corollary is equally important: if AI is producing poor code, the problem is usually upstream in your design document. Better design documents produce better first-pass code, which means fewer iterations and faster delivery.
Step 1: Gather Context
AI is only as good as the context you provide. The first step is systematic collection of everything the AI needs to understand what you're building and why.
What to Collect
Business Requirements
The user story, ticket, or feature request. Include acceptance criteria and any business constraints ("must ship by Q1," "can't break existing API").
Example: "Add rate limiting to user registration to prevent abuse. Must handle 100 req/sec normal load. Response latency < 100ms."
Technical Context
Relevant code paths, database schemas, API contracts. Point to specific files or modules affected by this work.
Example: "Current registration endpoint in `src/api/auth.ts`, user table schema, existing middleware pattern."
Visual Artifacts
Screenshots of current UI, error messages being fixed, mockups or wireframes. Visual context helps AI understand what users actually see.
Example: Screenshot of current registration flow, plus error logs showing abuse pattern.
Constraints
Performance requirements, security considerations, compliance needs. These are non-negotiables that bound the solution space.
Example: "Must use Redis (already in infrastructure), no user PII in logs, GDPR compliance required."
Step 2: Assemble Canon
Now that you have the specific context for this task, layer in the organizational knowledge that should inform all work: your team's canon files.
Which Canon Files to Include
Always Include
- • coding.md — Team conventions, patterns, anti-patterns
- • learned.md — Your personal failure patterns and preferences
Include When Relevant
- • security.md — If touching auth, data, external calls
- • infrastructure.md — If touching deployment, cloud resources
- • Domain-specific files — billing.md, user.md, etc.
Org-Level Files
- • Include when working on cross-cutting concerns
- • May be auto-included by tooling (e.g., CLAUDE.md)
Different AI tools handle canon inclusion differently. Claude Code auto-includes CLAUDE.md and allows @-mentions for additional files. Cursor uses .cursor/rules files. The principle is universal: make canon inclusion automatic where possible. You want your scaffolding injected into every session without manual overhead.
Step 3: Enter Plan Mode
This is where the Design-Compiler Pattern diverges most sharply from traditional AI-assisted coding. Instead of asking AI to write code, you explicitly tell it: think, don't implement.
What Plan Mode Is
Plan mode is a constraint you impose: the AI must produce a design document, not code. Different tools implement this differently—Claude Code has an explicit plan mode command; Cursor can be prompted with "Design first, don't code"; manual workflows include the instruction in the prompt.
The key is the constraint: no code output in this phase. When AI knows it can't write code, something interesting happens—its reasoning changes.
Why Plan Mode Works
AI reasoning capacity is finite. When generating code, much of that capacity goes to syntax, formatting, variable names—the mechanical details of implementation. When you remove the option to code, all that attention redirects to architecture, constraints, and trade-offs.
"When it's not outputting the code, it's thinking more with the world model side of things. The world model stays online when not burning cycles on syntax."
This mirrors how humans work. If you ask a senior engineer to design something, they think architecturally. If you ask them to implement it immediately, they start thinking about tabs versus spaces. Same brain, different mode. AI exhibits a similar pattern.
Step 4: Produce Design Document
The output of plan mode is a structured design document. This becomes the "source code" that AI will later compile into implementation code.
Design Document Structure
A good design document template provides scaffolding for thinking. Here's a proven structure:
- Related canon: coding.md, security.md
What Good Design Docs Include
✓ Clear Intent
What problem are we solving? Why does it matter? Stakeholder value.
✓ Explicit Constraints
What can't we change? Performance requirements, compliance needs, infrastructure limits.
✓ Interface Definitions
How do components talk? API contracts, data shapes, event schemas.
✗ Actual Code
That's Step 6. Design stays at conceptual level.
✗ Obvious Patterns
If it's in coding.md, reference it. Don't restate team conventions.
✗ Boilerplate
Reference canon instead of repeating it.
Step 5: Review Design
This is the quality gate. The design review happens before any code is written, which is when problems are cheapest to fix.
What to Check in Design Review
- Does it solve the actual problem stated in requirements?
- Does it follow patterns documented in our canon?
- Are constraints realistic and complete?
- Are there obvious failure modes not handled?
- Is the scope appropriate (not too big, not too small)?
- Are open questions acceptable or blockers?
The review is focused: you're reading hundreds of words, not thousands of lines of code. A thorough design review typically takes 15-30 minutes. Compare this to code review for the same feature, which can take hours to days.
Review Outcomes
✓ Approved
Proceed directly to implementation. Design is solid.
✓ Approved with Minor Changes
Update design doc with quick fixes, then implement.
⚠ Needs Rework
Back to Steps 3–4 with specific feedback. Approach is salvageable but needs revision.
✗ Rejected / Rethink
Fundamental approach is wrong. Start over with different direction.
The design review is the leverage point. After approval, implementation should be straightforward. If implementation reveals fundamental design flaws, that's signal to improve your design process—not to skip design review next time.
Step 6: Implement from Design
Now AI can code. With an approved design document in hand, implementation becomes directive rather than exploratory.
Why This Works Well
AI has clear direction (the design doc), explicit constraints (canon files), defined scope (what's in/out), and pre-decided trade-offs. There's no invention needed—just execution. This is where AI excels: given a clear specification, it generates clean implementation code reliably.
When AI Deviates from Design
It happens—AI may "improve" on the design or misinterpret something. First, check if the deviation is actually better (sometimes it is). If the deviation is worse, point to the design doc and ask for correction. If repeated deviations occur, the design doc is probably ambiguous—clarify it. This is part of the learning loop.
Step 7: Run Conformance Check
The final step verifies that code matches design. This can be partially automated, partially manual—but it's faster and more focused than traditional code review.
Automated Conformance
AI can review code against design:
This automated check catches structural issues: missing components, interface mismatches, forgotten error handling. It's faster than manual line-by-line review and more systematic.
Manual Verification
Some things still need human eyes: Does the logic make sense? Are there obvious bugs? Is the code maintainable? But the scope is reduced—you're checking conformance to a known design, not reverse-engineering intent from implementation.
Putting It All Together: A Worked Example
Let's walk through a real scenario to see how the seven steps flow in practice.
Scenario: Add Rate Limiting to User Registration
Step 1: Gather Context
- Ticket description and acceptance criteria
- Current registration endpoint code
- Recent abuse patterns from logs
- Performance requirement: < 100ms latency impact
Step 2: Assemble Canon
- coding.md (API patterns)
- security.md (rate limiting standards)
- infrastructure.md (Redis configuration)
Step 3: Enter Plan Mode
"Design a rate limiting solution for user registration. Do not write code."
Step 4: Produce Design Document
AI outputs design: Redis-based sliding window, 5 requests/minute per IP, 429 response with Retry-After header.
Step 5: Review Design
Architect reviews: "Good approach. Add note about bypassing rate limit for internal load testing IPs." Design doc updated.
Step 6: Implement from Design
AI generates rate limiting middleware, tests, documentation. Follows patterns from coding.md.
Step 7: Run Conformance Check
AI verifies all design elements implemented. Human verifies logic is correct, tests pass. Done.
Time Investment
Design-First Approach:
- Steps 1–2: 10 min
- Steps 3–4: 15 min
- Step 5: 10 min
- Step 6: 15 min
- Step 7: 10 min
- Total: ~1 hour
Traditional Approach (estimated):
- Write code: 1–2 hours
- Code review: 30–60 min
- Rework from review: 30–60 min
- Total: 2–4 hours
Design-first is not slower—it's faster, with fewer iterations and higher quality.
Common Patterns and Variations
The seven-step workflow adapts to different kinds of work. Here are the most common variations:
The Spike Pattern (Exploratory Work)
For experiments and proof-of-concepts:
- Design doc is shorter, more tentative
- Implementation is exploratory ("try this approach")
- Learnings feed into proper design doc
- Then standard workflow for production implementation
The Bug Fix Pattern
For fixing existing bugs:
- Design doc = root cause analysis + fix approach
- May reference original design doc
- Implementation is targeted
- Conformance check includes regression verification
The Refactoring Pattern
For refactoring existing code:
- Design doc = target state + migration approach
- Canon may need updating (new patterns)
- Implementation is phased
- Conformance check ensures functionality preserved
Key Takeaways
- • The Design-Compiler Pattern is a seven-step workflow: gather context, assemble canon, plan mode, design doc, review, implement, conformance check.
- • Steps 1–5 are thinking work (should be majority of time). Steps 6–7 are execution (should be quick and clean).
- • Plan mode keeps AI focused on architecture, not syntax. Design review is the quality gate—problems caught here are cheap.
- • Conformance checking validates code matches design. This pattern is not slower—it's faster with fewer iterations.
- • The workflow adapts: spike pattern for exploration, bug fix pattern for debugging, refactoring pattern for legacy code.
We've got the workflow. But how does this integrate into team practice? How do we make learning extraction systematic? That's what Definition of Done v2.0 is all about—the subject of Chapter 6.
Definition of Done v2.0
Your team ships features every sprint. Tests pass. Code is reviewed. Users are happy. But here's what you're not capturing: what you learned.
The traditional Definition of Done serves a critical function. It creates a quality bar, prevents "almost done" syndrome, and ensures teams share a common understanding of what "finished" means. But it contains a hidden cost that most teams never notice: knowledge evaporation.
Every ticket your team completes contains valuable learnings—edge cases discovered, patterns that worked, approaches that didn't, integration surprises, performance insights, security considerations. With a traditional DoD, these insights evaporate the moment the work is marked complete. The feature ships, but the team doesn't get smarter. They move on to the next ticket, leaving behind knowledge that should compound.
Definition of Done v2.0 changes this. It redefines completion to include not just working software, but captured learning. Every piece of work now leaves what one practitioner calls "a tiny fossil in the org's brain"—a permanent deposit that makes the entire team more capable.
TL;DR
- • Traditional DoD produces features but doesn't capture learning—knowledge evaporates with each ticket
- • DoD v2.0 adds mandatory learning extraction: what were the sticking points, surprises, and anti-patterns discovered?
- • Learnings are triaged to personal, team, or org canon files—creating compounding organizational capability
- • 5 minutes of reflection per ticket prevents hours of rediscovering the same lessons
What DoD Is (And Why It Matters)
The Definition of Done is a formal description of what "done" means for work items. It's a Scrum concept that has transcended its origins to become standard practice across software teams. According to Scrum.org, the DoD is "a formal description of the state of the Increment when it meets the quality measures required for the product." The moment a product backlog item meets the Definition of Done, an increment is born.
A typical enterprise DoD checklist might include:
- □ Code complete and compiles
- □ Unit tests written and passing
- □ Integration tests passing
- □ Code reviewed and approved
- □ Documentation updated
- □ Deployed to staging environment
- □ QA sign-off received
- □ Meets acceptance criteria
This is rigorous quality management. It prevents teams from shipping garbage, creates alignment on expectations, and provides measurable criteria for completion. As LeanWisdom notes, an effective DoD consists of "measurable and verifiable criteria that are agreed upon by the entire team," ensuring shared understanding of completion criteria and avoiding misunderstandings.
But there's a gap. A significant one.
The Learning Gap Nobody Talks About
Traditional DoD asks one question: "Is this work done?" It doesn't ask the equally important question: "What did this work teach us?"
Consider what happens on a typical ticket. Developer A discovers that the payments API returns a 400 status code for invalid amounts, not the expected 422. She handles it correctly in her code. The feature ships. Three months later, Developer B encounters the same API and spends two hours debugging why his error handling isn't working—until he discovers the 400 quirk. Six months after that, Developer C joins the team and...
You see the pattern. The knowledge existed—in Developer A's head—but it wasn't captured. The team pays the same "learning tax" repeatedly. Each developer rediscovers what their teammates already knew. Multiply this across hundreds of tickets and dozens of team members, and the waste becomes staggering.
This is the knowledge evaporation problem. Every ticket contains learnings:
- Edge cases discovered during implementation
- Patterns that worked exceptionally well
- Patterns that seemed good but proved problematic
- Integration surprises—APIs that behave unexpectedly
- Performance characteristics that weren't documented
- Security considerations that emerged during development
Traditional DoD: ship the feature, forget the lessons. The knowledge evaporates when you move to the next ticket, when someone else encounters the same issue, when a new team member joins, or when you hit the same problem six months later having completely forgotten the solution.
"Not just 'Did I close the ticket?' but 'What did this ticket teach us about our system and our process?' That's how you grow people who don't just code features, but evolve the organism."
Teams in this pattern are what we might call "busy but not getting better." Velocity is consistent, features ship every sprint, but organizational capability isn't compounding. The same types of bugs keep appearing. The same types of problems take the same amount of time. New team members take months to ramp up because there's no accumulation of team knowledge. Seniors become bottlenecks, answering the same questions repeatedly.
Shipping work doesn't equal building capability. Traditional DoD doesn't fix this because it was designed to ensure quality of individual work items, not accumulation of team knowledge. That's not a criticism—it's a recognition that the context has changed. In an AI-native development environment, the bottleneck shifts from writing code to capturing and distributing knowledge.
Definition of Done v2.0: The Enhanced Checklist
DoD v2.0 includes everything from traditional DoD, plus mandatory learning extraction. Here's the complete checklist:
Traditional DoD (v1.0)
- ☑ Code complete and compiles
- ☑ Unit tests written and passing
- ☑ Integration tests passing
- ☑ Code reviewed and approved
- ☑ Documentation updated
- ☑ Deployed to staging environment
Learning Extraction (v2.0 additions)
- ☑ Design doc updated (if reality changed during implementation)
- ☑ AI conformance check run (code matches design)
- ☑ Learnings extracted (sticking points, surprises, patterns identified)
- ☑ Canon updated (relevant learnings added to appropriate level)
- ☑ Knowledge promotion considered (should this go up a level?)
Let's examine why each addition matters:
Design doc updated: The design document is the source of truth (as we established in Chapter 4). If implementation revealed that the design was incomplete or inaccurate, update it. Next time AI regenerates from this design, it has the complete picture. The design doc should always reflect reality, not just original intent.
AI conformance check: Automated verification that code matches design. This catches drift before it compounds—when code diverges from design without updating either, you create confusion for future developers and AI systems alike. The conformance check (detailed in Chapter 5) ensures alignment between what you said you'd build and what you actually built.
Learnings extracted: Active reflection on what the work taught. Specific questions answered (we'll detail these shortly). This is not optional—it's a mandatory part of completion. The work isn't done when the feature ships; it's done when the learning is captured.
Canon updated: Learnings don't just sit in a notes file somewhere. They're added to the appropriate canon file—personal learned.md, team coding.md, or flagged for org-level promotion. This makes them immediately usable for all team members' AI sessions. The knowledge becomes active infrastructure, not passive documentation.
Knowledge promotion considered: Every developer actively participates in the knowledge architecture. Personal learning that applies to the team? Propose it via PR. Team learning that applies across the organization? Flag it for senior architects to review. This creates a continuous upward flow of validated insights.
The Three Questions at Every Ticket
Learning extraction doesn't require elaborate ceremonies or hour-long retrospectives. It requires three specific questions answered at ticket completion—typically taking about five minutes:
1. What were the sticking points?
Where did you get stuck? Where did the design not match reality? What took longer than expected? What required multiple iterations?
This identifies gaps between expectation and reality—the most valuable kind of learning.
2. What surprised you?
Edge cases you didn't anticipate? Integration issues that weren't in the design? Performance characteristics that differed from expectations? Behaviour that contradicted assumptions?
Surprises are teaching moments—they reveal incomplete mental models.
3. What should we never do again?
Patterns that caused problems? Approaches that wasted time? Assumptions that were wrong? Anti-patterns discovered through painful experience?
Anti-patterns are as valuable as patterns—knowing what not to do prevents waste.
These questions force reflection that would otherwise be skipped. Sometimes a ticket is genuinely straightforward—no surprises, no sticking points, design was perfect. That's valid. Answer: "No significant learnings." But forcing the question surfaces insights that would otherwise be lost. Even "routine" tickets sometimes reveal patterns when you pause to examine them.
The key is making this lightweight. Five minutes, not a ceremony. Capture can happen in the PR description, in a ticket comment, in a standup note, or in a dedicated learning log. The point is capture, not elaborate process. As one team lead put it: "As part of the whole process, you have to ask individual developers what were the sticking points on this, what did you have trouble with, where was the design document ineffective. And ask them to PR that back into the knowledge, into the md files, for other projects. That's part of the delivery for each piece of work."
Learning Extraction Template
Ticket: [link]
Completed by: [name]
Date: [date]
Sticking Points:
- [What took longer than expected?]
Surprises:
- [What didn't behave as anticipated?]
Never Do This Again:
- [What patterns should we avoid?]
Canon Update:
[ ] Added to personal learned.md
[ ] PRed to team coding.md (if applicable)
[ ] Flagged for org promotion (if applicable)
Triage Logic: Where Does This Learning Belong?
After extracting learnings, you need to decide where they belong in the three-layer canon architecture (Chapter 3). The triage logic is straightforward:
Personal Only (→ learned.md)
Specific to your workflow, not clearly generalizable, still experimental or unvalidated.
Example: "I prefer to see error handling first, then happy path"
Team-Wide (→ coding.md, infrastructure.md, etc.)
Applies to multiple team members, likely to recur in team's work, validated through this ticket.
Example: "The payments API returns 400 for invalid amounts, not 422 as documented"
Org-Wide (→ flag for promotion)
Applies across teams, represents genuine organizational learning, significant enough to cascade everywhere.
Example: "Never trust user-supplied file paths—sandbox all file operations"
The criteria for promotion are simple but important:
- Not obviously wrong: The learning has been validated through actual experience
- Useful to more than one person: It's not just a personal preference
- Likely to recur: The situation will come up again for teammates
When updating team-level canon, use the same process as code:
- Create a PR with the canon change
- Link to the ticket that surfaced the learning
- Explain what was learned and why it applies broadly
- Get peer review (just like code)
- Merge when approved
This creates an audit trail (git history), attribution (who discovered it), context (why it was added), and a quality gate (review before it affects everyone). The canon becomes as rigorously managed as your codebase—because it's equally important.
The Accumulation Effect
Each ticket adds a small deposit to the team's knowledge bank. The compounding happens over multiple timescales:
Over weeks: Dozens of learnings captured. The team starts noticing they're not hitting the same problems repeatedly.
Over months: Substantial knowledge base accumulated. New team members ramp up faster because the canon contains the edge cases and gotchas that would have taken months to discover organically.
Over quarters: Comprehensive canon that captures the team's collective experience. The knowledge base becomes a genuine competitive advantage—your team knows things that would take competitors months or years to learn.
The future self benefit is immediate and tangible. When you hit a similar problem, the canon already has the learning. AI surfaces it automatically in the context it provides. There's no need to rediscover, no need to ask around, no need to grep through old PRs hoping someone encountered this before. When a teammate hits the problem, they get the same benefit—your learning helps them, their learning will help you. The multiplication is automatic.
Onboarding Acceleration
A new team member joins. They inherit the full canon immediately:
- All the "never do this again" patterns
- All the edge cases discovered
- All the integration surprises
- All the performance characteristics
Their AI sessions are immediately informed by team history. Ramp-up time drops from months to weeks—or weeks to days.
Perhaps the most powerful compounding effect involves model upgrades. When a better AI model releases—and they release every few months now—it reads your enhanced canon. All your accumulated learnings multiplied by better reasoning capability. Output quality jumps immediately. Teams without this infrastructure start from zero at each upgrade. Teams with DoD v2.0 get multiplicative gains automatically. The better your canon, the more you benefit from each model improvement. We'll explore this flywheel effect in depth in Chapter 9.
Key Insight
The difference between DoD v1.0 and v2.0 is compounding. V1.0 produces features. V2.0 produces features and learning. Over time, v2.0 teams don't just ship more—they ship with fewer mistakes, faster ramp-up, and increasing capability. The gap widens with every sprint.
System Legibility and Project Intelligence
DoD v2.0 creates a secondary benefit that traditional DoD doesn't provide: system legibility. When design documents describe intended behaviour, canon files describe known constraints, and ticket learnings describe discovered reality, you have comprehensive understanding of the system that's inspectable and queryable.
Traditional project status conversations rely on vibes. "How's the project going?" gets answered with gut feelings and rough percentages that may or may not reflect reality. With design docs and learning capture, you can ask more precise questions:
- "What's the design coverage for this scope?" (How many of our features have design docs?)
- "What learnings have we accumulated?" (What has this project taught us?)
- "Which areas have the most surprises?" (Where is our understanding weakest?)
- "How many open questions remain?" (What critical decisions are still pending?)
AI can analyze design documents to assess project state with far more precision than human estimation. As one architect noted: "You could take AI to have a look at the documents and work out what percentage of the statement of work we've done, or where we're up to on the current sprint." This isn't replacing human judgment—it's augmenting it with data that would be nearly impossible to gather manually.
Incident response improves dramatically. When a bug appears in production, the traditional approach involves grepping through code, guessing where to look, and hoping someone remembers encountering something similar. With design docs and canon, the process becomes:
- Search design docs first (semantic, fast): "Which component handles user authentication?"
- Check canon for known issues: "Have we seen this error pattern before?"
- Narrow to specific code second, now that you know where to look
- Update design doc and canon with the fix, so the next person doesn't repeat the investigation
The system becomes self-documenting not through post-hoc documentation, but through the natural accumulation of design artifacts and extracted learnings. This is documentation that stays current because it's used actively, not documentation that rots in a wiki.
Integration and Enforcement
DoD v2.0 isn't a separate process—it's an extension of existing workflows. The integration points are straightforward:
PR templates: Add learning extraction questions to your pull request template. Before submitting, developers answer the three questions. Reviewers can see both the code changes and the extracted learnings in one place.
Ticket workflows: Add a "Learning Captured" checkbox to your ticket completion criteria. Tickets can't be moved to "Done" until learnings are documented.
Sprint review: Include a brief section on "What we learned this sprint." Highlight the most valuable canon updates. This creates visibility and reinforces that learning extraction is valued work.
Team lead enforcement doesn't require heavy process—it requires consistent questions:
- During standup: "What did you learn from that ticket?"
- During sprint review: "What canon updates came out of this sprint?"
- During 1:1s: "Show me your recent learning extractions—what patterns are you noticing?"
Visibility and expectation create habit. When learning extraction is treated as essential rather than optional, teams internalize it quickly.
Common resistance patterns emerge predictably, and all have straightforward responses:
"This is extra work": It's five minutes per ticket. The alternative is repeating the same discoveries—that's more work, not less. Front-load the investment, save time later.
"I don't have time": If you don't have five minutes for learning extraction, you'll spend hours rediscovering. Which is actually faster?
"My learnings aren't important enough": You won't know until you capture them. Patterns emerge from many small observations. What seems minor may be critical in aggregate.
"The canon will get too big": Pruning is part of maintenance. Not everything goes to team or org level. The three-layer system provides natural filtering. Big org canons are usually a sign of trying to standardize too much at the wrong level—that's an architectural problem, not a DoD problem.
Measuring Impact
How do you know if DoD v2.0 is working? Track both leading indicators (adoption) and lagging indicators (impact):
Leading Indicators
- Extraction completion rate: What % of tickets have learning extraction completed?
- Canon update frequency: How often are canon files being updated?
- Promotion rate: How often do learnings move from personal → team → org?
- Canon freshness: When was each canon file last meaningfully updated?
Lagging Indicators
- Repeated problem rate: How often do we encounter previously-solved issues?
- Onboarding time: How long until new members are productive?
- Design accuracy: How often does design match implementation reality?
- Model upgrade impact: How much does output quality improve at model releases?
Beyond metrics, watch for qualitative signals that indicate the culture is shifting:
- "We saw this before—it's in the canon" (knowledge retrieval working)
- "The new person found the answer in coding.md" (onboarding acceleration)
- "The design doc already covered that edge case" (design quality improving)
- "When we upgraded to [new model], our outputs improved immediately" (flywheel effect)
These phrases indicate that DoD v2.0 has moved from process compliance to genuine organizational capability.
Zooming Out: The Complete System
DoD v2.0 is one component of a larger organizational learning system. Understanding how the pieces fit together reveals why this approach is so powerful:
- Canon (Chapter 3): Where knowledge lives—the three-layer structure of personal, team, and org files
- Design Docs (Chapter 4): Where intent is captured—design as primary artifact, code as derived
- Design-Compiler Workflow (Chapter 5): How work is done—the seven-step process from context to validated code
- DoD v2.0 (this chapter): How learning is extracted—the mandatory reflection and capture
- Learning Extraction Ritual (Chapter 7): The specific practice—making extraction systematic and lightweight
- Knowledge Promotion (Chapter 8): How learnings move up—the refactoring of knowledge across layers
The flywheel effect emerges from the interaction of these components:
- DoD v2.0 feeds the canon with extracted learnings
- Canon feeds better AI sessions through richer context
- Better AI sessions produce better designs
- Better designs produce smoother implementation
- Smoother implementation means more capacity for learning extraction
- More learning extraction feeds the canon...
Each cycle strengthens the next. This is compounding in action—not just additive improvement, but multiplicative capability growth. Chapter 9 will detail this flywheel and show how model upgrades amplify the compounding effect.
"Done is no longer 'it works on my machine and the tests are green.' It's: the system behaves as designed, is legible via its design docs, has updated those docs when reality changed, and has pushed any generalisable lesson up into the right layer."
A team running DoD v2.0 at maturity exhibits characteristics that are immediately visible:
- They never encounter the same surprise twice—if it happened before, it's in the canon
- They onboard new members in days or weeks, not months—the canon accelerates the learning curve dramatically
- Their design docs accurately predict implementation—because learnings flow back into design improvements
- Their canon is comprehensive and current—because staleness creates bad outputs, which creates immediate incentive to fix
- Each model upgrade produces dramatic output improvements—because better reasoning meets better scaffolding
- The team is continuously getting better, not just busy—capability compounds visibly over time
This is the end state: organizational learning running at near-individual speed, knowledge compounding across the team, and each improvement multiplying the value of every subsequent improvement.
Chapter Summary
- → Traditional DoD produces features but doesn't capture learning—knowledge evaporates with each completed ticket
- → DoD v2.0 adds five mandatory elements: design doc updates, AI conformance checks, learning extraction, canon updates, and knowledge promotion consideration
- → Three questions at every ticket: What were the sticking points? What surprised you? What should we never do again?
- → Learnings are triaged to personal, team, or org canon—creating structured knowledge accumulation
- → Five minutes per ticket front-loads investment that saves hours of rediscovery later
- → Every ticket leaves a "fossil in the team's brain"—permanent knowledge that compounds over time
- → The system becomes legible, measurable, and continuously improving—not just busy, but genuinely better
- → Compounding effect: teams with DoD v2.0 get better exponentially; teams without just get busy linearly
We've defined what DoD v2.0 includes and why it matters. But how do you actually do learning extraction in practice? What makes it effective versus performative? Chapter 7 introduces the Learning Extraction Ritual—a specific, repeatable practice you can implement immediately to make organizational learning systematic rather than accidental.
The Learning Extraction Ritual
Chapter Preview
You've added learning extraction to your Definition of Done. But what does that actually look like in practice?
This chapter provides the detailed, practical guide: specific techniques, templates, and habits you can implement immediately—including how to mine conversation history for automated learning extraction.
"The difference between knowing you should extract learnings and actually doing it is having a ritual simple enough to become automatic."
In Chapter 6, we added "learning extraction" to your Definition of Done. The concept is clear: every piece of work should leave knowledge artifacts that improve future work. But there's a massive gap between understanding the principle and building the habit.
I've watched teams struggle with this gap. They embrace the concept intellectually. They add it to their DoD checklist. Then weeks later, it's fallen away—not because people don't believe in it, but because they don't have a concrete ritual for doing it.
This chapter closes that gap. By the end, you'll have a five-minute ritual you can start using today, templates for canon updates, and strategies for mining your conversation history to automate learning extraction.
The Three Questions Framework
At the end of every piece of work—every ticket, every PR, every substantial chunk of implementation—you ask yourself three questions. Not ten. Not a complex reflection exercise. Just three targeted questions designed to surface the learnings that matter.
These questions work because they're specific enough to trigger concrete memories but broad enough to catch different types of learning. They take about five minutes to answer if you're honest and focused.
Question 1: What Were the Sticking Points?
What this surfaces:
- Where the design didn't match reality
- Where estimates were wrong
- Where complexity was underestimated
- Where dependencies surprised you
How to ask it:
- "What took longer than I expected?"
- "Where did I get stuck and have to figure something out?"
- "What made me say 'wait, this isn't working' during implementation?"
Example answers:
- "The design said the API returns JSON, but it actually returns XML in error cases"
- "I expected the auth middleware to handle this, but it doesn't run on WebSocket connections"
- "The database query was slow—had to add an index we didn't plan for"
What to do with it: Update the design doc with the correction. Add to coding.md if it's a pattern others will hit. Add to personal learned.md if it's specific to this context.
Question 2: What Surprised You?
What this surfaces:
- Edge cases not in requirements
- Integration behaviour differences
- Performance characteristics
- Security considerations
- User behaviour patterns
How to ask it:
- "What did I discover that wasn't in the ticket or design?"
- "What behaviour did I find that I didn't expect?"
- "What would have bitten me if I hadn't tested carefully?"
Example answers:
- "The payment provider returns success but charges fail silently for certain card types"
- "Users submit forms with emoji in the name field—we weren't handling Unicode properly"
- "The third-party API rate limits differently for GET vs POST requests"
What to do with it: Add edge cases to test suite. Update design doc with discovered constraints. Add to team canon if it's a service others will use.
Question 3: What Should We Never Do Again?
What this surfaces:
- Anti-patterns discovered
- Approaches that wasted time
- Wrong assumptions
- Footguns in the codebase
How to ask it:
- "What did I try that was a dead end?"
- "What approach would I warn someone else away from?"
- "What assumption was wrong and cost me time?"
Example answers:
- "Don't use the legacy user service for new features—it has race conditions"
- "Don't assume the cache is warm on first request—add a fallback"
- "The 'simple' approach of denormalizing here created update anomalies"
What to do with it: Add as explicit anti-pattern in coding.md. Add warning comments in code if it's a dangerous area. Create ADR explaining why we don't do X.
Making Extraction Lightweight
The enemy of learning extraction is complexity. If your ritual takes thirty minutes, involves multiple tools, and requires perfect articulation, it won't happen. Teams will skip it when they're busy—which is exactly when learnings are most valuable.
The solution is ruthless simplification. Five minutes. Rough notes. Imperfect capture beats no capture.
The 5-Minute Target
- Learning extraction should take approximately 5 minutes
- If it takes longer, you're over-thinking or over-documenting
- The goal is capture, not perfection
- Rough notes beat no notes every time
- You can refine later when promoting to team canon
The Imperfection Principle: Don't wait for perfect articulation. Rough captures are better than none. The first pass is for memory; refinement is for communication.
Where to Capture: Four Options
Option A: In the PR Description
Best for: Linking learnings directly to code changes
Option B: In a Ticket Comment
Best for: Searchable history, visible to team
- • Add a comment with the three sections
- • Links to the specific work
- • Searchable later across all tickets
Option C: In a Team Log
Best for: Visibility and team-wide review
- • Shared document or Slack channel for learnings
- • Daily/weekly collection
- • Good for surfacing patterns across work
Option D: Direct to Canon Files
Best for: Experienced practitioners, obviously applicable learnings
- • If learning is clearly applicable, go straight to learned.md/coding.md
- • Skip intermediate capture
- • Requires good judgment about scope
The Canon Update PR Template
When promoting learnings from personal capture to team or org canon, use a structured PR format. This serves multiple purposes: it provides context for reviewers, creates a searchable history, and enforces the "why does this belong here?" question that prevents canon bloat.
Canon PRs should be reviewed like code PRs—but you're reviewing for accuracy and applicability, not style. The review question is simple: "Is this true? Does it help?"
Full Canon Update PR Template
Example: Real Canon Update PR
Trigger
PR #847 - Implement user export endpoint
What We Learned
The export service silently truncates fields longer than 255 characters. No error is thrown—the data is just cut off.
Why It Belongs Here
This is a shared service used by multiple team members. Anyone building export features will hit this. It's not documented anywhere in the service's own docs.
Impact If We'd Had This Earlier
I spent 2 hours debugging why exported names were truncated. A one-line warning in our canon would have saved that time.
Changes Made
Added to coding.md under "Service Gotchas":
- Export service truncates fields >255 chars silently. Validate length before sending or handle truncation gracefully.
Related Canon
See also: security.md note on max field lengths for PII
Review expectations: Canon PRs should be reviewed quickly. Knowledge is perishable—merge fast if it's accurate and applicable. The bar is "Is this true and helpful?" not "Is this perfectly written?"
Mining Conversation History
Here's something most teams miss: every AI coding session contains learnings you haven't captured.
When you work with Claude Code, Cursor, or similar tools, you're constantly making corrections ("no, not like that"), providing clarifications ("the auth flow works like X"), and fixing repeated issues ("always remember to handle null here"). These corrections are invisible learnings—patterns you've internalized but haven't externalized into your canon.
Your conversation history is a goldmine of implicit knowledge. The question is how to extract it systematically.
What to Mine For
Repeated Corrections
If you corrected the same thing multiple times, it belongs in canon.
Pattern: "No, use X instead of Y" appearing 3+ times across different sessions
Domain Knowledge Injections
If you explained the same context repeatedly, it should be in a file.
Pattern: "Remember, in our system..." explanations that recur
Pattern Clarifications
"We do X because of Y" statements that establish rationale.
Pattern: Explanations that connect implementation to business logic
Anti-Pattern Warnings
"Never do X because Z happened last time" cautions.
Pattern: Warnings about footguns or approaches to avoid
Manual Mining Approach
For occasional extraction (weekly or monthly review):
Review last week's AI sessions
Open your Claude Code / Cursor / Copilot history
Search for correction patterns
- "No, instead..."
- "We always..."
- "Don't do X..."
- "The reason is..."
Extract generalizable statements
Convert specific corrections into general rules
Add to appropriate canon file
Triage to personal, team, or org level
Automated Mining Approach
For systematic extraction (monthly deep analysis):
Step 1: Collect conversation logs
Many AI coding tools save conversation history locally. For example, Claude Code stores sessions in .claude/projects folder.
Check your tool's documentation for where conversations are stored.
Step 2: Run a meta-pass with AI
Use an AI model to analyze the conversations themselves. Example prompt:
Step 3: Human reviews suggestions
AI-extracted learnings need human review. AI may misunderstand context, over-generalize from specific cases, or miss nuance.
Step 4: Promote valid ones to canon
Treat AI extraction as proposals, not decisions. Apply human judgment before promotion.
⚠️ The QA Step Is Critical
AI can extract patterns, but it can also hallucinate connections, miss context, or propose rules that are too broad. Always apply human judgment before adding to canon.
Ask: "Is this actually a pattern, or just coincidence? Would this rule help the team, or create confusion?"
Learning Extraction by Work Type
Not all work produces the same types of learnings. Tailor your extraction focus to the type of work you just completed.
Feature Development
Focus on:
- Where design didn't anticipate reality
- Integration surprises with existing code
- Performance or scalability discoveries
- User-facing edge cases
Typical learnings:
- "The legacy component requires X before Y"
- "This UI pattern doesn't work on mobile Safari"
- "The feature flag system has a 5-minute cache"
Bug Fixes
Focus on:
- Root cause (not just symptom)
- Why it wasn't caught earlier
- What testing would have prevented it
- Similar bugs that might exist
Typical learnings:
- "Race condition: service A must complete before service B"
- "Need integration test for this path—unit tests missed it"
- "This pattern exists in 3 other places—check them"
Refactoring
Focus on:
- What made the old code problematic
- What makes the new code better
- Migration pitfalls encountered
- Patterns established for similar future work
Typical learnings:
- "Old pattern X led to Y problems—new pattern Z avoids this"
- "When migrating, also update A, B, C"
- "The refactored approach is 3x faster—document why"
Spikes and Research
Focus on:
- What approaches were tried and rejected (and why)
- What approach was chosen (and why)
- What we still don't know
- What surprised us about the technology
Typical learnings:
- "Library X seemed promising but has deal-breaker Y"
- "Blog post approach doesn't work in our context because Z"
- "Open question: how does this scale beyond 10k records?"
Common Anti-Patterns in Learning Extraction
Avoid these common mistakes that make learning extraction ineffective:
❌ Anti-Pattern 1: Too Vague
Bad Example
"Had some issues with the API"
Good Example
"The payments API returns 200 status with error body for declined cards—check response.error_code, not status"
Fix: Ask "would this help someone else encountering this?"
❌ Anti-Pattern 2: Too Specific
Bad Example
"On line 847 of user_service.py, the variable name was confusing"
Good Example
"When naming validation functions, use 'validate_X' not 'check_X' for consistency with our patterns"
Fix: Abstract from the specific case to the generalizable pattern
❌ Anti-Pattern 3: Missing the "Why"
Bad Example
"Don't use direct database queries"
Good Example
"Don't use direct database queries—repository pattern allows mocking for tests and centralizes query optimization"
Fix: Always include the reason for the rule
❌ Anti-Pattern 4: Already Documented
Adding something that's already in coding.md creates noise and duplication.
Fix: Search canon before capture. If it exists, maybe strengthen the existing entry rather than duplicate.
❌ Anti-Pattern 5: Never Acted On
Writing learnings that sit in a log forever without being added to canon is wasted effort.
Fix: Extraction isn't done until canon is updated (DoD v2.0). Triage each learning to appropriate file.
Building the Habit
Knowledge is the theory. Practice is the implementation. Here's how to make learning extraction automatic rather than something you remember to do when you're not busy.
Triggering the Ritual
- ✓ Make it part of the PR merge flow
- ✓ Add a checklist item to PR template
- ✓ Include in ticket workflow (can't move to "done" without it)
- ✓ Calendar reminder if needed initially
- ✓ Pair it with an existing habit ("after code review, extract learnings")
Making It Visible
- ✓ Share good learning extractions in team channel
- ✓ Highlight in sprint review: "This sprint, we added X learnings to canon"
- ✓ Recognize people whose learnings get promoted
- ✓ Create a "learning leaders" metric if useful
Normalizing Imperfection
- ✓ "No significant learnings" is a valid answer
- ✓ Short, rough entries are fine
- ✓ The goal is habit, not perfection
- ✓ Volume of captures > quality of any single capture (initially)
Evolving the Practice
- ✓ Start simple: just the three questions
- ✓ Add structure over time as habit forms
- ✓ Experiment with different capture formats
- ✓ Adapt to team's workflow and tools
Worked Example: Email Verification Feature
Let's walk through a complete example to make this concrete.
The Ticket
FEAT-423: Implement email verification for new user registration
Work Summary: Added verification email on signup, token generation, verification endpoint, expiry handling.
Time spent: 6 hours (estimated: 4 hours)
The Extraction (5 minutes)
Sticking Points:
- The email service client takes 3-5 seconds on first call (cold start). Design didn't account for this.
- The token expiry logic was more complex than expected—needed timezone handling.
Surprises:
- Some email providers bounce instantly for formatting issues (before even attempting delivery)
- The existing user model didn't have a "verified" boolean—required migration
Never Again:
- Don't assume email delivery is instant—always design for async
- Don't forget to add test coverage for email edge cases (bounced, delayed, expired token)
The Canon Updates
To personal learned.md
"Email service client has cold start—first call is slow. Consider warming in background."
To team coding.md (via PR)
Added under "External Services → Email":
- "Email service has 3-5s cold start. First call is slow."
- "Some providers bounce instantly on format issues—handle send errors"
Added under "Database Migrations":
- "Always check if new boolean fields need backfill for existing records"
Flagged for org-level consideration
"All async operations (email, notifications, etc.) should use queue-based architecture, not synchronous calls"
(Needs architect review before promoting to org canon)
TL;DR: Chapter 7 Summary
- • Three questions at every ticket: Sticking points (design vs reality), Surprises (discoveries), Never-agains (anti-patterns)
- • Target 5 minutes: Rough capture beats no capture. Imperfection is fine—first pass is for memory, refinement is for communication
- • Canon Update PRs: Use the template (Trigger, What We Learned, Why It Belongs, Impact, Changes, Related Canon)
- • Mine conversation history: Your AI sessions contain repeated corrections and implicit knowledge—extract it manually or with AI meta-analysis
- • Tailor by work type: Features focus on integration, bugs on root cause, refactors on migration patterns, spikes on rejected approaches
- • Avoid anti-patterns: Not too vague, not too specific, always include "why", check for duplicates, ensure canon gets updated
- • Build the habit: Trigger it (PR checklist), make it visible (sprint reviews), normalize imperfection, evolve over time
What's Next: Knowledge Promotion
You've extracted learnings and updated canon files. But how does knowledge flow from personal to team to org? How do you decide what gets promoted and when?
Chapter 8 provides the clear criteria and process for moving learnings up the hierarchy—turning individual insights into organizational standards.
Chapter 8
Knowledge Promotion
How knowledge flows from individual insight to organizational standard—and why refactoring knowledge is as important as refactoring code.
You refactor code to make it more reusable, more general, more maintainable. Knowledge needs the same treatment.
The raw insight you captured in your personal learned.md after fixing that authentication bug—it's valuable. But it's not in its final form.
Right now, it's tightly coupled to one ticket, written in the language of your immediate frustration, phrased for your future self. It's the knowledge equivalent of a 200-line function that does everything inline.
This chapter is about the elevation process: how you take personal discoveries and refactor them into team patterns, and how team patterns become organizational standards. It's about knowledge promotion—the systematic practice of moving learnings up the hierarchy so they compound instead of staying trapped in individuals.
Because here's the truth: knowledge that stays personal is knowledge that dies with context switches. Projects end, people leave, priorities shift. If your discoveries don't make it into the team or org canon, they evaporate.
The Refactoring Metaphor
Every developer understands refactoring. You take code that works but is hard to maintain—duplicated logic, unclear names, tight coupling—and you restructure it. Extract functions. Introduce abstractions. Rename variables for clarity.
The code still does the same thing. But now it's more general, more reusable, easier to understand. You've made local knowledge global, specific solutions general.
Code Refactoring vs Knowledge Refactoring
Code Refactoring:
- • Extract common patterns into functions
- • Generalize specific solutions
- • Make local knowledge global
- • Improve reusability
Knowledge Refactoring:
- • Extract common learnings from individuals
- • Generalize specific discoveries
- • Make personal insights team-wide
- • Improve knowledge reusability
Knowledge works the same way. Raw learnings are often specific to one context, written quickly, connected to one ticket, phrased for immediate understanding. They're rough drafts.
Team and org canon needs learnings that are generalized beyond the specific case, clearly written for any team member, applicable broadly. That transformation—from personal discovery to shared standard—is knowledge refactoring.
The Generalization Step
Consider the difference:
Personal (Not Refactored)
"On PR #847, the export service truncated my name field and I spent 2 hours debugging why the test data wasn't showing up. Turns out it silently drops anything over 255 chars. WTF."
Team (Refactored)
"The export service silently truncates fields longer than 255 characters without logging warnings. Always validate field length before calling exportData() or handle truncation explicitly in the UI."
Context: Discovered during user data export feature (PR #847). Affects any flow that exports user-generated content with variable-length fields.
The move from personal to team is the refactoring. You've removed the specific (PR number, personal frustration), kept the general pattern (the service behavior, how to handle it), and added context so others can recognize when it applies.
"If your discoveries don't make it into the team or org canon, they evaporate. Knowledge that stays personal is knowledge that dies with context switches."
When to Stay Personal, When to Promote
Not everything should be promoted. The three-layer canon only works if each layer contains the right kind of knowledge. Personal learned.md files are low-friction dumping grounds. Team canon is curated shared truth. Org canon is conservative bedrock.
Here's how to decide:
Keep It Personal When
Personal Preference
Example: "I like seeing error handling at the top of functions before the happy path."
This is coding style, not a team pattern. Keep it in your personal file where it helps you write consistent code.
Workflow-Specific
Example: "When working in Claude Code, I always start sessions by loading infra.md first, then the design doc."
This is your personal workflow optimization. Others might load context differently.
Not Validated Yet
Example: "I think pagination breaks when page size > 100 but I've only seen it once."
Unconfirmed hypothesis. Keep it personal until you validate it or see it happen again.
Context-Specific
Example: "The test database on my local machine needs Redis restarted after every schema migration."
This is your local environment quirk, not a team-wide issue.
Promote to Team When
Multiple People Will Encounter This
Example: "The payment webhook can arrive before the database transaction commits. Always query with a 2-second retry window."
Anyone working on payment flows will hit this race condition. It belongs in team canon.
The Pattern Is Validated
Example: "We've seen three incidents where uncaught exceptions in background jobs caused silent data loss. Always wrap job logic in try/catch with explicit logging."
You've seen it work (or fail) multiple times. It's proven, not speculative.
Shared Systems
Example: "Our Postgres connection pool is configured for max 20 connections. If you need more than 5 concurrent queries in a feature, discuss with the team first."
This is about shared infrastructure everyone touches. Team-wide awareness prevents problems.
Saves Others Time
Example: "The staging environment cache doesn't auto-invalidate like production. Always manually flush after deploying schema changes."
If knowing this saved you an hour of debugging, it'll save your teammates an hour too.
Promote to Org When
The bar is higher for org-level canon. These are principles that apply across teams, affect company-wide systems, or represent architectural decisions that shouldn't be violated without explicit discussion.
Cross-Team Applicability
Example: "All services must expose health check endpoints at /health for the load balancer to monitor."
Every team that deploys services needs to follow this convention. Org-level standard.
Security/Compliance Requirements
Example: "PII must never be logged in plain text. Use the redact() utility from @company/logging for any user data."
This is a compliance requirement that applies everywhere. Non-negotiable org standard.
Architectural Principles
Example: "We use event sourcing for all financial transactions. Direct database updates to transaction records are forbidden."
This is an architectural decision that shapes how multiple teams build features. Belongs in org canon.
Org-level promotion usually requires senior or architect approval because the blast radius is wider. Wrong advice at the org level propagates to every team. Be conservative.
Promotion Decision Tree
Is this just my preference?
→ YES: Stay in personal
→ NO: Continue ↓
Will teammates encounter this?
→ NO: Stay in personal
→ YES: Continue ↓
Is it about our shared systems?
→ NO: Stay in personal
→ YES: Promote to team ↓
Will other teams encounter this?
→ NO: Stay in team
→ YES: Flag for org promotion
The Mechanics of Promotion
Promotion isn't magic. It's a process—specifically, a pull request process. Knowledge flows upward through version control, with review and discussion, just like code.
Personal → Team Promotion
This happens most frequently, ideally multiple times per week as team members extract learnings.
The Promotion Workflow
1. Identify a Learning
Review your personal learned.md during or after work. Spot something that meets the three-question criteria.
2. Refactor for Clarity
Generalize the language. Remove ticket-specific details. Add context about when it applies.
Example: Change "On the invoicing PR" to "When exporting financial data"
3. Create a PR
Open a pull request to the appropriate team canon file (coding.md, infrastructure.md, etc.). Include why this matters in the PR description.
4. Request Peer Review
Tag a teammate familiar with that area. They validate: "Yes, I've seen this too" or "Actually, that's been fixed now" or "Let's clarify the wording."
5. Merge When Approved
Once reviewed, merge. The learning is now part of team canon. AI sessions that load coding.md will see it immediately.
6. Clean Up Personal
Optionally remove it from your personal file if it's now redundant. Or keep a pointer: "See team coding.md for export service guidelines."
Timeline: This can happen anytime, but most teams find a weekly rhythm works well—either during sprint reviews or as a dedicated 15-minute async check-in.
Team → Org Promotion
Less frequent, more deliberate. Usually initiated by senior engineers, tech leads, or architects who spot patterns appearing across multiple teams.
1. Identify Cross-Team Pattern
"Team A added this to their canon. Team B just discovered the same thing. This should probably be an org-wide standard."
2. Verify Across Teams
Before proposing, check: "Teams B and C, does this apply to you too?" Get confirmation it's truly cross-cutting.
3. Refactor for Org-Level Language
Make it more formal, more general. Add references to why this is a company-wide standard (architecture decision, compliance requirement, etc.).
4. Create PR to Org Canon
Request review from senior engineers or architects. May need broader socialization (architecture forum, Slack discussion).
5. Merge After Consensus
Higher bar than team promotion. Needs explicit buy-in because it affects everyone.
Timeline: Quarterly canon reviews or ad-hoc when a clear pattern emerges. Org canon should be stable, updated deliberately.
Recognition Without Gamification
Here's what makes knowledge promotion psychologically powerful: when your insight gets promoted, you've made a lasting contribution. Not to a sprint backlog that gets cleared every two weeks. To the permanent infrastructure of how your team thinks and works.
What Meaningful Recognition Looks Like
Personal → Team
Your discovery is now how the team does things. Git history shows your attribution. Teammates benefit every time AI reads that canon file.
Team → Org
Your team's insight is now company-wide. Your pattern shapes how the entire org works. It's referenced in onboarding, enforced by tooling, cited in architecture reviews.
Durable Impact
Unlike code that gets refactored or features that get deprecated, good canon entries last for years. Your contribution compounds over time.
This is better than gamification—points, badges, leaderboards—because it's real impact, not artificial metrics. Your contribution is visible in the system. Others actually use it. It's durable.
"The reward isn't badges and leaderboards. It's 'this thing you discovered is now how we do things here.'"
Making Recognition Visible
Teams that do this well create simple rituals:
-
→
Call out promotions in team meetings: "Alex's discovery about the export service is now in
coding.md. Thanks for documenting that." -
→
Note in sprint reviews: "This sprint, 3 learnings were promoted to team canon, 1 to org canon. We're building capability, not just shipping features."
-
→
Maintain a contributors log: Simple markdown file or dashboard showing who's actively contributing to canon. Shows engagement, creates social proof.
The visibility matters not for competition, but for culture. It signals: "We value learning and knowledge-sharing as much as we value shipping code."
Curators, Not Gatekeepers
Someone needs to steward each layer of canon. But the role isn't to block contributions—it's to help them flourish.
Curation Responsibilities
✓ What Curators Do
- • Accept good promotions (merge PRs quickly)
- • Improve promoted content (edit for clarity)
- • Identify patterns ("These 5 entries should be one")
- • Prune outdated content (remove what's no longer true)
- • Maintain structure (keep files organized)
- • Coach contributors (help them refactor better)
✗ What Curators Don't Do
- • Block contributions without explanation
- • Enforce perfection (sparse canon is worse than imperfect canon)
- • Hoard knowledge (promote broadly, not narrowly)
- • Let canon rot (inactive curation = dying canon)
- • Create unnecessary process (PRs should take hours, not weeks)
Who curates? Personal layer: you. Team layer: rotating role or natural owners by area. Org layer: senior architects or designated knowledge stewards.
The mindset is gardener, not gatekeeper. You want a thriving, useful canon—not a sparse, perfect one. Improve rather than reject where possible. The goal is contribution velocity, not editorial control.
Handling Conflicts and Contradictions
Knowledge isn't always neat. Sometimes learnings contradict each other. Sometimes new discoveries invalidate old canon. Sometimes people just disagree. The PR process is where you resolve this.
When Learnings Contradict
Developer A adds: "Always validate input at the controller level."
Developer B adds: "Validation belongs in the service layer, not controllers."
Both can't be in canon. Here's how to resolve it:
- Surface the contradiction in the PR discussion. "This conflicts with the existing entry about controller validation."
- Discuss the context. Maybe both are right in different situations: "Controller validation for user input, service layer validation for business rules."
- Decide on a resolution. Update canon to reflect the nuanced answer, not just one person's preference.
- Document the decision. Explain why this is the team's position. Future contributors understand the reasoning.
When New Learning Contradicts Existing Canon
Canon says: "Use library X for JSON parsing."
New learning: "Library X has a critical security vulnerability. We've switched to library Y."
Don't just add a second entry. Update the existing one:
- Verify the new learning. Is this confirmed? (Yes, security advisory published.)
- Propose update to existing entry. PR that replaces old guidance with new.
- Mark old advice as superseded, not deleted. Add strikethrough and a note: "Updated 2025-12: Use library Y due to CVE-2025-XXXX in library X."
- Explain the change. Canon history shows why practices evolve.
When You Disagree With Existing Canon
You think the canon is wrong. But someone else added it for a reason. Don't just override it.
- Understand the original context. Use git blame to find who added it. Ask: "What was the situation that led to this guideline?"
- Propose an update with reasoning. "In light of [new information / changed circumstances], I think we should revise this to..."
- Be open to nuance. You might both be right in different contexts. Canon can capture that: "Generally use approach A, but in scenario X, use approach B."
- Let discussion improve canon. The back-and-forth in PR comments often produces better guidance than either original position.
Common Anti-Patterns
Knowledge promotion can fail in predictable ways. Watch for these:
Over-Promotion
Symptom: Everything goes to team or org level. Canon files balloon to 5000+ lines.
Problem: Canon becomes bloated, noisy, unusable. Signal-to-noise ratio collapses.
Fix: Stricter application of criteria. Most things should stay personal. Promote only what genuinely helps multiple people.
Under-Promotion
Symptom: Rich personal files, sparse team canon. Same discoveries repeated by different people.
Problem: Knowledge doesn't spread. Team doesn't benefit from individual learning.
Fix: Regular reviews of personal canon for promotion candidates. Make it part of DoD: "Did you promote any learnings this sprint?"
Premature Promotion
Symptom: Untested insights go straight to team or org. Canon contains speculative advice.
Problem: Bad guidance propagates. Team follows unproven patterns, wastes time.
Fix: Validation step. Has this been proven in at least two contexts? If not, keep it personal until validated.
Promotion Without Refactoring
Symptom: Team canon full of ticket-specific language, raw frustration, unclear phrasing.
Problem: Hard to understand, hard to apply in new contexts. Low reusability.
Fix: Always generalize when promoting. Remove the specific, keep the pattern. Edit for clarity before merging.
Stale High-Level Canon
Symptom: Org canon hasn't been touched in 18 months. Contains obsolete advice.
Problem: Wrong advice at org level affects everyone. Teams follow outdated patterns.
Fix: Regular review cadence (quarterly). Actively prune outdated entries. Update when systems change.
The Ripple Effect: How Promotion Multiplies Impact
Here's the math that makes knowledge promotion worth the effort:
Personal Insight → Team Canon
Your 1-hour debugging session produces a learning: "The export service silently truncates fields > 255 chars."
You promote it to team canon.
Impact: 10 teammates each save 30 minutes when they encounter this = 5 hours saved.
If they each hit this situation once per quarter = 20 hours/year saved.
Over 2 years (typical team tenure) = 40 hours saved from your 1-hour investment.
ROI: 40x. And that's conservative—it assumes only your immediate team benefits.
Team Pattern → Org Canon
Your team's learning about payment webhook race conditions gets promoted to org canon.
Impact: 5 other teams working on payment features avoid the same bug = 25+ hours saved across teams.
Each of those teams would have spent ~5 hours debugging and fixing the race condition.
Multiply by team size (6 people) = 150 person-hours of debugging avoided.
Plus: the bug never makes it to production, so no customer impact, no incident response, no reputation damage.
Impact scales with org size. The more teams, the higher the leverage.
The Onboarding Multiplier
Every promoted learning helps every future team member. New hires inherit the full canon on day one. Their ramp-up time is reduced because they don't have to rediscover what the team already knows.
Consider two scenarios:
Without Canon
New developer joins. Over their first 3 months, they:
- • Hit the export service truncation bug (2 hours lost)
- • Discover payment webhook race condition (4 hours lost)
- • Learn Postgres connection pool limits the hard way (3 hours lost)
- • Figure out staging cache behavior through trial and error (2 hours lost)
Total: 11 hours of rediscovering existing knowledge
With Canon
New developer joins. First week, they:
- • Read team
coding.md(30 minutes) - • Load it as context in all AI sessions
- • Never hit any of those bugs
- • Contribute productively from day 3
Time saved: 10.5 hours in first 3 months
Multiply that by every new hire, every year. The compound effect is massive.
Key Insight: Promotion as Infrastructure
Promotion isn't optional polish. It's core infrastructure.
Without promotion, learnings are trapped in individuals. Each person has to rediscover the same truths. Knowledge resets with every context switch, every project handoff, every new hire.
With promotion, learnings compound across the team and org. The difference is linear improvement vs exponential improvement.
Practical Cadence and Tools
Make promotion routine, not heroic.
Weekly Team-Level Reviews
-
→
15 minutes during team sync or async in Slack
-
→
Review personal → team promotion candidates. "Anyone have learnings to promote this week?"
-
→
Quick discuss-and-approve. Most PRs should be merged same day if they're clear.
-
→
Assign owner for any needed refactoring before merging.
Quarterly Org-Level Reviews
-
→
Part of architecture review or similar forum
-
→
Review team → org promotion candidates. "What patterns are showing up across multiple teams?"
-
→
Prune stale org canon. "Is this still true? Has the system changed?"
-
→
Update after major changes. Platform migrations, architecture shifts—these invalidate old canon.
Tagging and Tracking
Simple practices that make canon more useful:
- Tag canon entries with source. "Added based on incident INC-4521" or "Discovered during payment refactor (PR #892)." Helps understand context later.
- Track promotion paths. Note in commit messages: "Promoting from personal to team" or "Elevating to org canon after confirmation from Teams B and C."
- Date entries when added. "Added 2025-11" helps with staleness detection.
- Flag for review after system changes. When you migrate databases or switch libraries, mark related canon entries for verification.
Future State: Canon Dashboard
Not essential to start, but useful as practice matures:
- Files by last update. Surface canon that hasn't been touched in 12+ months for review.
- Entries by age. Older entries are candidates for pruning or validation.
- Promotion velocity. Track how many personal → team → org promotions happen per quarter. Declining velocity might signal process problems.
- Contributors. Who's actively contributing to canon? Helps identify knowledge champions.
These are nice-to-haves. The core practice—PR-based promotion with regular reviews—is what matters.
Connecting to the Bigger Picture
Knowledge promotion isn't a standalone practice. It's the upward flow in the full learning system we've been building across these chapters.
The Full Learning Loop
Work is done
Design-first workflow (Ch 4-5), implementation, testing
Learning extraction (Ch 7)
Three questions: what went wrong, what surprised you, what to never do again
Canon update (personal level)
Capture in learned.md, part of Definition of Done v2.0 (Ch 6)
Promotion consideration (this chapter)
Apply criteria, refactor for clarity, PR to appropriate level
Canon is now richer
Next work item benefits from promoted knowledge immediately
Model upgrades multiply value (Ch 9)
Richer canon + better model = exponential gains
Every promotion adds to the scaffolding. And scaffolding is what creates compounding returns when new models arrive. We'll explore that flywheel effect in the next chapter.
Chapter Summary
Key Takeaways
- • Knowledge promotion is refactoring—generalizing personal learnings for reuse across the team and org.
- • Promotion criteria: multiple people affected, validated through use, applicable broadly.
- • Mechanics are simple: PR process with peer review, refactoring for clarity before merging.
- • Recognition through promotion is more meaningful than gamification—your contribution becomes how the team works.
- • Curators are gardeners, not gatekeepers—the goal is contribution velocity, not editorial control.
- • Conflicts and contradictions are good—they produce richer, more nuanced canon through discussion.
- • Watch for anti-patterns: over-promotion (bloat), under-promotion (knowledge silos), premature promotion (bad advice).
- • Promotion multiplies impact—personal insight saves the team hundreds of hours, team patterns save the org thousands.
- • Regular review cadence keeps the system flowing: weekly for team, quarterly for org.
Coming Up: Chapter 9
We've built the infrastructure: the three-layer canon, design documents as primary artifacts, Definition of Done v2.0, learning extraction, knowledge promotion.
Now let's see what happens when you add time to this equation. Why does this system create exponential improvement instead of linear? How do model upgrades turn into multiplicative events? The Model Upgrade Flywheel awaits.
The Model Upgrade Flywheel
Chapter Preview
Every few months, a new model drops. Claude Opus 4.5. GPT-5. Gemini 3.0. What happens next depends entirely on what you've built beforehand.
This chapter reveals why teams with scaffolding get exponential gains from model upgrades while teams without get linear improvement—and how to position your team to harvest future model dividends.
"When the next model arrives, will you start from scratch—or will you plug it into months of accumulated scaffolding and watch your capability jump?"
November 2024. Anthropic releases Claude Opus 4.5. Developers worldwide upgrade their API keys and immediately feel the improvement—the model is smarter, faster, more capable. Instructions are followed more precisely. Code generation is more accurate. Context handling is stronger.
But here's what separates two teams using the same model on the same day:
Team A gets the baseline improvement. Their prompts work better. Their AI sessions feel smoother. They're pleased. They post on Twitter about the upgrade. Life goes on.
Team B experiences something different entirely. Their output quality doesn't just improve—it jumps. Features that used to take multiple iterations now nail it first try. Edge cases that required explicit reminders are handled automatically. Design documents compile into better code than before. The whole team notices.
Same model. Same day. Dramatically different outcomes.
The difference? Team B had months of scaffolding in place—canon files refined through dozens of tickets, design document patterns validated across projects, learnings extracted and promoted through the three-layer hierarchy. When the better engine arrived, it executed against a mature system.
This chapter is about that gap—and why it widens with every model release.
Linear vs Exponential: The Mental Model
Most teams experience AI improvement as a linear function of model capability. Better model = better output. Simple, predictable, bounded.
But there's another curve available—one where improvement compounds, multiplies, accelerates over time. Understanding the difference starts with seeing what you're actually optimizing.
Two Improvement Curves
Linear Improvement (Without Scaffolding)
What happens at upgrade:
- New model is released
- It's smarter, faster, more capable
- You get the benefit of that improvement
- But: you're starting from the same place as before
- Next upgrade: same pattern
The math:
Capability = model_capability
Each upgrade adds model improvement delta. Growth is linear with releases.
Exponential Improvement (With Scaffolding)
What happens at upgrade:
- New model reads your existing canon
- It uses your refined design documents
- All your learnings are executed by a smarter engine
- You get model improvement + scaffolding + interaction effects
- Output quality jumps noticeably
The math:
Capability = model_capability × scaffolding_quality × your_judgment
Each upgrade improves multiple factors. Growth is multiplicative.
After one upgrade cycle, the difference is small. After three, it's noticeable. After six upgrades over 18 months, the teams are operating in different capability universes.
"Every few months: base capability goes up (new model), plus your meta-context is richer, plus your ability to exploit both has improved. Two optimisation loops, interlocking."
The Three Compounding Factors
The exponential curve emerges from three factors that multiply each other. Each improves independently, but the real magic happens in their interaction.
Factor 1: Model Capability
Outside your control (AI labs do this)
But it happens regularly—every 3-6 months, a meaningful upgrade. The trend line is consistently upward.
What's improving:
- Reasoning quality and depth
- Context window size and handling
- Code generation accuracy
- Instruction following precision
- Domain knowledge breadth
Claude Opus 4 leads on SWE-bench at 72.5% and Terminal-bench at 43.2%. For coding, agentic workflows now achieve 95% vs 48% for GPT-4 alone. The baseline keeps rising.
Factor 2: Scaffolding Quality
Completely in your control
Improves through canon development, design doc refinement, learning extraction. The chapters before this one are about building this factor.
What's improving:
- Canon comprehensiveness and accuracy
- Design document clarity and completeness
- Knowledge promotion flow maturity
- Pattern library depth and organization
- Anti-pattern documentation specificity
Your .md files are acting like soft weights sitting on top of the model. Each time you update them, you're effectively doing "manual fine-tuning by text."
Factor 3: Your Judgment
Also in your control
Improves through practice, exposure to quality outputs, developing your ability to critique and specify. The more you use AI with good scaffolding, the better you get at both.
What's improving:
- Ability to spot weak or incorrect AI outputs
- Skill at specifying exactly what you want
- Recognition of what belongs in canon vs personal notes
- Capacity to write effective design documents
- Meta-skill: learning how to learn with AI
Better outputs → you read better examples → you become more discerning → you extract better learnings → scaffolding improves → even better outputs. A self-reinforcing loop.
Why Multiplication Matters
If each factor improves 20% per upgrade cycle:
1.2 × 1.2 × 1.2 = 1.73× improvement
(not 1.6× from simple addition)
Compound over 6 upgrade cycles (18 months):
3.0 × 3.0 × 3.0 = 27× total improvement
(not 6× from linear growth)
This is why it feels exponential—because mathematically, it is. The factors multiply, they don't add.
The Stacked Compounding Effect
Think of it like this: a model upgrade is a better engine. But the performance gain depends entirely on what car you drop that engine into.
Stock Car + Better Engine
Put a high-performance engine in a stock vehicle:
- Chassis not optimized for the power
- Suspension can't handle the torque
- Aerodynamics create drag
- Driver hasn't learned the new power band
Result: Some improvement, but bounded by the system
Tuned Car + Better Engine
Put a high-performance engine in a car you've already tuned:
- Chassis optimized for power delivery
- Suspension dialed in for handling
- Aerodynamics refined for speed
- Driver skilled at extracting maximum performance
Result: Dramatic performance jump—the whole system sings
"Like swapping a better engine into a car you've already tuned hard. All your previously-distilled rules suddenly get executed by a smarter engine."
What "Already Tuned" Looks Like
At the moment of a model upgrade, a well-prepared team has:
- Canon files capturing months of learnings across personal, team, and org layers
- Design documents refined through multiple iterations and validated against reality
- Patterns and anti-patterns documented and organized for quick reference
- Team conventions codified in version-controlled markdown
- Knowledge promotion flow established as routine practice
All of this immediately benefits from the smarter engine. The canon gets read by better reasoning. The design documents get compiled by more accurate code generation. The patterns get applied more consistently.
Nothing in your workflow changes. You just plug in the new model and watch output quality jump.
The Feedback Loop Acceleration
After the upgrade, the loop speeds up:
- Better model produces better outputs
- You read higher quality work
- You become more discerning and critical
- You capture more nuanced learnings
- These feed back into the canon
- The canon becomes even richer for the next upgrade
The flywheel doesn't just spin—it accelerates.
The Opus 4.5 Jump: A Lived Example
Theory is useful. Experience is visceral. Here's what the compounding effect actually feels like when you've done the work.
November 2024: Claude Opus 4.5 Release
The Setup
Months of infrastructure already in place: refined marketing.md, learned.md with current model info and date context, coding.md with established patterns, design-first workflow normalized.
The Upgrade
Changed one line in the config: point to Opus 4.5 instead of 3.5. Same prompts. Same process. Same canon files.
The Experience
"My output just jumped a whole bunch, and it's exponential."
- First-pass accuracy noticeably higher
- Fewer iterations needed to get to "done"
- Edge cases caught without explicit prompting
- Patterns from canon applied more consistently
- Overall quality clearly improved across all outputs
The Compounding
Because outputs were better, reading them sharpened judgment. Because judgment improved, learnings captured were more nuanced. Because learnings were better, learned.md got richer. Next session benefited even more.
Two Optimization Loops, Interlocking
The exponential effect emerges from the interaction of two separate optimization processes running in parallel.
Loop 1: AI Lab Optimization
Who: Anthropic, OpenAI, Google, etc.
What: Training larger models, better architectures, improved reasoning, expanded knowledge
Timeline: 3-6 month cycles per major release
Investment: Millions of dollars, months of compute
Your role: Passive beneficiary—you get the improvement automatically by upgrading
"The model lab is doing SGD on weights"
Loop 2: Your Optimization
Who: You and your team
What: Building canon, refining designs, extracting learnings, developing judgment
Timeline: Continuous, every ticket, every sprint
Investment: 5 minutes per ticket for extraction, weekly for promotion
Your role: Active participant—you control the improvement rate through deliberate practice
"You're doing SGD on your workflows, your canon, and your taste"
The Interlocking
Loop 1 feeds Loop 2
Better model outputs give you better examples to learn from. Your taste improves by exposure to quality.
Loop 2 feeds Loop 1's impact
Better scaffolding means model improvements are more fully utilized. The upgrade doesn't just improve the baseline—it multiplies your prepared system.
The loops accelerate each other
Each cycle of both loops makes the next cycle more valuable. The system doesn't just improve—it improves at an increasing rate.
The Flywheel Mechanism
A flywheel is simple physics: it takes energy to spin up, but once spinning, it maintains momentum with less effort. Each push adds to the existing motion. Eventually, it becomes self-sustaining.
The AI learning flywheel works the same way.
The Six Steps of the Learning Flywheel
Use AI
With current scaffolding (canon, design docs, patterns)
Produce Output
Quality depends on: model capability × scaffolding × judgment
Observe and Critique
Your judgment develops through exposure and practice
Extract Learnings
Definition of Done v2.0: capture what went wrong, what surprised, what to avoid
Update Canon
Scaffolding improves (personal → team → org promotion)
Repeat
Each cycle adds momentum—the flywheel spins faster
When a model upgrade happens:
Steps 1-6 continue unchanged. But step 2 produces higher quality output. Step 3 exposes you to better examples. Steps 4-5 capture more nuanced learnings. The flywheel doesn't just keep spinning—it accelerates.
The Self-Reinforcing Dynamic
This is what "compounding" means operationally:
- Better canon → better AI output (AI reads better context)
- Better AI output → better learning opportunities (you see higher quality examples)
- Better learning → richer canon (more nuanced insights captured)
- Richer canon → even better AI output (the loop closes)
Each revolution of the cycle adds to all previous revolutions. The improvement doesn't reset—it stacks.
Time Horizons and Investment
Building scaffolding feels like overhead initially. The payoff isn't immediate. Understanding the time horizons helps maintain commitment through the ramp-up phase.
Short-Term (1-3 months)
Investment:
- Building initial canon files
- Establishing design doc habits
- Learning the DoD v2.0 practice
Returns:
Marginal improvement. Feels like "Is this worth it?"
Reality: You're spinning up the flywheel. Initial resistance is normal.
Medium-Term (3-6 months)
Investment:
- Refining canon through use
- Promoting learnings regularly
- Building design doc library
Returns:
Noticeable improvement. Feels like "This is helpful."
Reality: Flywheel has momentum. Benefits becoming visible.
Long-Term (6+ months)
Investment:
- Maintaining and gardening
- Periodic review and pruning
- Onboarding others into practice
Returns:
Dramatic advantage. Feels like "Can't imagine working without this."
Reality: Flywheel is self-sustaining. Compounding is obvious.
The First Model Upgrade: When You Really Feel It
If you start building scaffolding today, 3-6 months from now a new model will likely release. That's when the compounding becomes visceral.
The upgrade is free—everyone gets it. But the benefit isn't equal. Those with scaffolding pull ahead. Those without just get the baseline improvement.
That moment—when you feel your prepared system multiply the model's improvement—is when the investment clicks.
Model-Agnostic Scaffolding
One of the most powerful aspects of this approach: your investment isn't tied to a specific model or vendor.
Canon files are plain markdown. Design documents are plain markdown. Knowledge hierarchies are version-controlled text. None of this is Claude-specific or GPT-specific or Gemini-specific.
Why Plain Text Matters
Markdown is:
- Human-readable — you can edit it in any text editor
- AI-readable — every LLM can parse and understand it
- Version-controllable — Git tracks changes, enables collaboration
- Portable — works across tools, platforms, providers
- Durable — won't be obsolete in 5 years
No vendor lock-in. No format dependency. No proprietary API you're betting your knowledge infrastructure on.
Switching Models Is Trivial
If you decide to switch from Claude to GPT to Gemini (or use all three for different tasks):
- Your
coding.mdworks with all of them - Your
marketing.mdworks with all of them - Your design documents work with all of them
- Your judgment and practice transfer across all of them
The scaffolding is the constant. The model is the variable you can swap freely.
Riding Multiple Improvement Waves
Because your scaffolding works across providers, you benefit from all model improvement trajectories:
OpenAI Track
GPT-4 → GPT-4o → GPT-5 (coming)
Anthropic Track
Claude 3 → Claude 3.5 → Claude 4
Google Track
Gemini Pro → Gemini 1.5 → Gemini 3.0
Each wave adds to your accumulated benefit. You're not starting over with each provider. You're bringing your entire scaffolding to every new model.
"Model releases become just another gear in the machine, not the whole machine."
The Competitive Dynamics: The Gap Widens
Let's be blunt about what happens over time when two teams have access to the same models but different scaffolding discipline.
18 Months, 6 Model Upgrades
Team A (With Scaffolding)
- • Month 0: Starts building canon, design docs, learning extraction
- • Month 3: First model upgrade → noticeable jump in output quality
- • Month 6: Canon is rich, patterns established → second upgrade compounds
- • Month 9: Third upgrade → team operating at dramatically higher capability
- • Month 12: Fourth upgrade → compounding is obvious to everyone
- • Month 15: Fifth upgrade → onboarding new hires takes days, not weeks
- • Month 18: Sixth upgrade → team capability is 10-27× initial baseline
Team B (Without Scaffolding)
- • Month 0: Uses AI directly, no systematic learning capture
- • Month 3: First model upgrade → improvement felt, but starts from same baseline
- • Month 6: Second upgrade → another bump, but knowledge still personal
- • Month 9: Third upgrade → good, but repeated discoveries slow progress
- • Month 12: Fourth upgrade → still rediscovering patterns with each ticket
- • Month 15: Fifth upgrade → new hires still take weeks to ramp up
- • Month 18: Sixth upgrade → team capability is 3-6× initial baseline
Same models. Same industry. Dramatically different trajectories.
The gap isn't temporary. It widens with each upgrade cycle.
Team A compounds: better canon → better outputs → better learning → richer canon → even better outputs on next upgrade.
Team B resets: each upgrade helps, but knowledge stays trapped in individuals. Context switches, projects end, people leave—learning evaporates.
The Catch-Up Problem
Can Team B catch up?
Theoretically, yes. They'd need to:
- Build scaffolding infrastructure
- Accumulate months of learnings
- Establish the practice across the team
But here's the problem:
Team A isn't standing still. While Team B is building scaffolding, Team A is refining theirs and benefiting from the next model upgrade.
Catch-up requires faster improvement than the leader. Possible, but increasingly difficult as the gap grows.
First-Mover Advantage
The advantage isn't in the tools—everyone has access to the same models. The advantage is in the scaffolding, which is:
- • Unique to your team
- • Accumulated over time
- • Refined through practice
- • Embedded in your workflow
The earlier you start building, the more upgrade cycles you benefit from.
This is why "start now" matters more than "start perfectly." Rough scaffolding today beats perfect scaffolding started in six months.
Beyond Code: The Pattern Generalizes
We've been using software development as the primary example because it's concrete and measurable. But the flywheel pattern isn't specific to coding.
It applies anywhere you have:
- Repeated AI-assisted work
- Accumulated knowledge that improves outcomes
- Learning opportunities from outputs
Which is... most knowledge work.
Marketing
Canon: marketing.md with brand voice, positioning, audience insights, successful campaign patterns
Compounding: Each campaign teaches something about what resonates. Model upgrades make messaging sharper, more on-brand, more effective.
Flywheel: campaign → results → learnings → richer canon → better next campaign
Sales
Canon: sales.md with objection handling, customer patterns, competitive intelligence, winning proposal structures
Compounding: Each deal teaches something about what moves prospects. Model upgrades make proposals more persuasive, more tailored.
Flywheel: deal → outcome → learnings → richer canon → better next proposal
Support
Canon: support.md with common issues, resolution patterns, product knowledge, escalation criteria
Compounding: Each ticket teaches something about failure modes and customer needs. Model upgrades make responses faster, more accurate.
Flywheel: ticket → resolution → learnings → richer canon → better next response
Operations
Canon: ops.md with runbooks, incident patterns, system knowledge, troubleshooting trees
Compounding: Each incident teaches something about system behavior. Model upgrades make diagnosis faster, more systematic.
Flywheel: incident → resolution → learnings → richer canon → faster next diagnosis
The Org-Wide Flywheel
When multiple teams run their own domain-specific flywheels:
- • Cross-team learning happens through org canon
- • Model upgrades boost everyone simultaneously
- • Knowledge flows: personal → team → org → all teams benefit
- • The organization as a whole compounds capability
This is what "learning organization" looks like in practice—not a theory, but a running system.
TL;DR: Chapter 9 Summary
- • Linear vs exponential: Without scaffolding, improvement is linear with model upgrades. With scaffolding, it's multiplicative.
- • Three compounding factors: Model capability × scaffolding quality × your judgment. Each improves independently, multiply for total effect.
- • Stacked compounding: Like a better engine in a tuned car. Prepared systems extract maximum value from model improvements.
- • Two optimization loops: AI labs improve models (passive benefit), you improve scaffolding (active benefit). Loops interlock and accelerate each other.
- • Opus 4.5 jump: Personal experience validates the theory. Existing scaffolding + better model = dramatic output quality increase.
- • The flywheel: Use → output → critique → learn → update → repeat. Each cycle adds momentum. Model upgrades accelerate the spin.
- • Time horizons: 1-3 months feels like overhead, 3-6 months shows returns, 6+ months creates dramatic advantage. First model upgrade is when it clicks.
- • Model-agnostic: Plain markdown works across all providers. Your investment isn't locked to Claude or GPT. Ride all improvement waves.
- • Competitive dynamics: The gap between scaffolded and non-scaffolded teams widens with each upgrade. Catch-up becomes increasingly difficult.
- • Beyond code: Pattern generalizes to marketing, sales, support, operations—any domain with repeated AI-assisted work and learning opportunities.
What's Next: Cross-Pollination Without Meetings
The flywheel is about time and compounding. You've seen how scaffolding multiplies model improvements, creating exponential gains over linear.
But there's another dimension to this infrastructure: how it changes team dynamics and specialist knowledge sharing. How does Tim the security expert scale his knowledge without meetings? How do architects review at the design level instead of drowning in code? Chapter 10 reveals the social transformation that scaffolding enables.
Chapter 10
Cross-Pollination Without Meetings
How specialists scale their expertise through canon files without requiring meetings, and how architect review shifts from code to design.
Your security expert has knowledge that should be in every developer's head. The traditional approach: training sessions, documentation, meetings. The result: half-forgotten, half-ignored, definitely not applied consistently.
Meet Tim. Tim is your security specialist. He's brilliant, dedicated, and increasingly frustrated. He's written comprehensive security guidelines. He's given training sessions. He's reviewed hundreds of pull requests, leaving the same comments over and over: "Don't log PII in plain text." "Hash before storing." "Validate authentication headers."
The problem isn't that Tim doesn't care. It's that Tim can't scale. He's a bottleneck. Every security-relevant code review needs him. Every architectural decision touches security, so he's in every meeting. Every new developer needs the same training he gave the last five.
And the team? They're not malicious. They're busy. They attended Tim's training, but that was six months ago. They've seen his security doc, but it's 47 pages long and they didn't know which parts applied to their current work. So they do what feels obvious, ship the feature, and wait for Tim to catch it in review.
This pattern repeats across every specialty: the performance expert who optimizes queries, the operations engineer who knows deployment gotchas, the architect who holds system coherence in their head. Specialist knowledge lives in individual humans, and the only way to transfer it is through synchronous, high-touch interactions that don't scale.
The Traditional Knowledge Transfer Problem
Before we solve it, let's be honest about how bad the problem is. Knowledge transfer in software teams has always been difficult, but AI hasn't changed that—yet.
The Four Failing Approaches
1. Training Sessions
Theory: Gather everyone, teach them the patterns once, they'll remember and apply them.
Reality: People retain about 10% after a week. They remember the session happened, not what was said. Six months later, it's as if it never occurred.
Cost: High (specialist time + everyone's calendar). Retention: Terrible. Scalability: Doesn't survive turnover.
2. Documentation
Theory: Write it down once. Everyone reads it. Knowledge preserved forever.
Reality: Documentation is written, never read, immediately stale. It's separate from the work, so developers don't think to consult it. When they do find it, it's out of sync with the current codebase.
Famous last words: "It's documented in the wiki." (Nobody has looked at the wiki in 8 months.)
3. Code Review
Theory: Specialist reviews code, catches mistakes, teaches through comments.
Reality: Feedback comes after implementation. Developer has already invested hours in the wrong approach. Specialist becomes a bottleneck—every PR waits in their queue. Same issues repeated across different developers.
Specialist frustration level: Maximum. "I've explained this pattern six times this month."
4. Meetings
Theory: Bring everyone together for design reviews, architecture discussions, security walkthroughs.
Reality: Most attendees are passively present. "That's Tim's job, I'm just here because it was mandatory." Information is delivered but not retained. No connection to actual work happening that day.
Developer inner monologue: "I wonder if there are snacks after this."
"Nobody really cares, because, oh, I don't do security? That's Tim's job. I'm just like, oh, I'll just sit in the meeting."
Why They All Fail
These approaches share a fatal flaw: they're all push-based, synchronous, and disconnected from the work.
The Core Problems
Push-Based
Knowledge is delivered whether you need it or not. Training on security patterns when you're building a UI component. No relevance = no retention.
Synchronous
Requires everyone's time at once. Doesn't scale with distributed teams, flexible schedules, or asynchronous work. Specialist time is wasted on routine questions.
Disconnected from Work
Training happens in a conference room. The work happens in code. The gap between "I was taught this" and "I apply this" is where knowledge dies.
No Enforcement Mechanism
Even if knowledge transfers, there's no way to ensure it's applied. Code review catches violations after the fact, when re-work is expensive.
The result is predictable: specialists burn out from repetitive teaching, developers keep making the same mistakes, knowledge is unevenly distributed, and onboarding takes forever because everything is relearned person-by-person.
The Canon Solution: Tim Writes security.md
Here's the radically different approach: Tim writes his security knowledge into security.md. That's it. That's the unlock.
Not a 47-page policy document. Not a training deck. Not tribal knowledge locked in his head. A living, version-controlled markdown file that contains the security patterns, anti-patterns, and rules that every developer should follow.
What Goes in security.md
Authentication Requirements
What: All API endpoints must verify JWT tokens. Use requireAuth() middleware from @company/auth.
Why: Prevents unauthorized access. Centralized auth logic reduces attack surface.
Example:
app.get('/api/users/:id', requireAuth(), getUserHandler);
Data Handling Rules
PII Logging: Never log PII in plain text. Use redact() utility for any user data.
Data at Rest: All PII must be encrypted in database. Use encryptedField decorator on model properties.
Data in Transit: HTTPS only for all endpoints. TLS 1.2 minimum.
Common Vulnerabilities to Avoid
SQL Injection: Always use parameterized queries. Never concatenate user input into SQL strings.
XSS: Sanitize all user input before rendering in UI. React does this by default, but be careful with dangerouslySetInnerHTML.
CSRF: All state-changing requests require CSRF tokens. Use csrfProtection() middleware.
Security Review Triggers
These changes require Tim's review before merge:
- Authentication or authorization logic changes
- New endpoints handling PII
- Changes to encryption/hashing
- Third-party API integrations
Everything else: AI applies these rules, peers review for correctness.
How It Works in Practice
When a developer starts work on a feature that touches security-relevant code:
The New Workflow
1. Developer Starts Work
"I need to add a password reset endpoint."
2. AI Session Loads Context
Their AI tool (Claude Code, Cursor, etc.) automatically loads:
security.md(team canon)coding.md(team patterns)infrastructure.md(deployment requirements)
3. AI Applies Tim's Rules
When designing the endpoint, AI knows:
- Must use
requireAuth()middleware - Must use parameterized queries for database
- Must hash passwords before storing (never plain text)
- Must include rate limiting to prevent brute force
- Must log security events using
securityLogger
4. Design Review
Developer submits design doc. Tim (or a peer) reviews. Design already follows security patterns because AI baked them in. Review is fast.
5. Implementation
Code is generated following the approved design. AI conformance check validates code matches design and follows security.md patterns.
6. Tim's Involvement
Only if needed. If this endpoint is routine, Tim doesn't review at all. If it's novel (new crypto algorithm, third-party auth integration), Tim does a deep review—but only of the truly complex parts.
Notice what happened: Tim's thinking was applied to the work without Tim being present. The developer didn't have to remember training from six months ago. They didn't have to search through 47 pages of documentation. The security patterns were automatically injected as context when they started the work.
"Tim does care about security, so he writes a whole bunch of stuff into the scaffolding. Now everyone uses it, and you didn't have to sit through the meeting and ignore it."
Tim's Thinking Gets Replayed Thousands of Times
Here's the leverage: Tim wrote security.md once. Maybe it took him a day to document all the patterns, anti-patterns, and rules he keeps repeating in code reviews. One day of focused work.
Now every developer's AI session reads that file. Every design that touches security applies those rules. Every implementation follows those patterns. Tim's thinking—his security expertise, his hard-won knowledge of what breaks and why—is replayed automatically in every relevant context.
The Multiplication Effect
Before Canon
Tim reviews every security-relevant PR (15+ per week). Each review takes 30-60 minutes. He's in every architecture meeting. Total: 20+ hours/week on knowledge transfer.
After Canon
Tim maintains security.md (1-2 hours/week). Reviews only novel/complex security changes (3-5 hours/week). Total: 5-7 hours/week.
Tim's New Capacity
15 hours/week freed up for strategic work: threat modeling, security architecture, researching new vulnerabilities, proactive hardening.
Specialists as Canon Authors
Tim's pattern generalizes. Every specialist can scale their expertise the same way: by writing it down in a form AI can read and apply.
The Specialist Role Shift
Old Role vs New Role
❌ Old Specialist Model
- Answer questions — Same questions, repeatedly
- Attend meetings — Every architecture discussion
- Review code — Every PR touching their domain
- Give training — Same session for each new hire
- Write docs — That nobody reads
Result: Specialist is bottleneck, knowledge doesn't scale, team remains dependent.
✓ New Specialist Model
- Write canon — Once, maintained over time
- Review canon PRs — Team proposes updates
- Update when rules change — New vulnerabilities, new tools
- Deep consultation — Only genuinely novel cases
- Strategic work — Proactive improvement, architecture
Result: Specialist multiplies impact, knowledge scales automatically, team becomes self-sufficient.
Types of Specialist Knowledge
Different specialties need different canon files. The pattern is the same: capture the patterns and anti-patterns that you would normally teach through meetings and code review.
Security Specialist → security.md
Contents: Auth patterns, data handling, vulnerability prevention, compliance requirements, security review triggers
Impact: Every developer's AI session applies security rules automatically. Fewer vulnerabilities make it to code review. Compliance becomes part of design, not an afterthought.
Performance Engineer → performance.md
Contents: Caching strategies, query optimization patterns, N+1 prevention, async job best practices, monitoring requirements
Impact: AI designs with performance in mind from the start. Database indexes planned upfront. Caching strategy integrated into architecture, not bolted on later.
Operations Engineer → infrastructure.md
Contents: Deployment requirements, logging standards, alerting patterns, health check endpoints, infrastructure as code templates
Impact: Services are designed to be operable from day one. Logging, monitoring, health checks built in—not added during production firefighting.
Domain Expert → domain/{area}.md
Contents: Business rules, edge cases, regulatory constraints, legacy system quirks, data model relationships
Example: billing.md contains: "Invoices cannot be modified after 30 days due to audit requirements. Always create credit note for corrections."
Impact: Developers implement features that respect business constraints without needing to ask. Fewer bugs from misunderstanding domain logic.
The Architect Bandwidth Shift
Architects face the same bottleneck problem as other specialists, but with a twist: they're responsible for system-wide coherence. That means every non-trivial decision potentially needs their input. The traditional model makes this unsustainable.
The Old Architect Model
Traditional architect workload:
-
×
Review every significant PR — Scan hundreds of lines of code for architectural violations
-
×
Attend design discussions — Multiple per week, for every team
-
×
Answer "how should we do this?" questions — Constantly, via Slack, email, hallway conversations
-
×
Catch violations after implementation — "Why did you add a direct database call here? We use the repository pattern."
-
×
Bottleneck on all non-trivial decisions — Work waits in their review queue
Result: Architects drowning in detail, no time for strategic thinking, frustrated by repetitive feedback, team blocked waiting for reviews.
The New Architect Model
Canon-based architect workload:
-
✓
Shape the architectural canon —
architecture.mdcontains: patterns we use, patterns we don't, system boundaries, data flow rules -
✓
Review design documents — Not code. Small surface area, semantic content, catches problems before implementation
-
✓
Curate team/org canon — Review canon update PRs, ensure architectural soundness, refactor knowledge
-
✓
Deep consultation on genuinely novel challenges — Only when the problem is truly new, not covered by existing patterns
-
✓
Strategic thinking about system evolution — Where should the architecture go? What needs to change? How do we migrate?
Architect Time Allocation Shift
Before Canon
After Canon
Design Review Instead of Code Review
This is the key shift: architects review design documents, not code.
Why this is higher leverage:
- Smaller surface area: Design doc is 1-2 pages. Code PR is 500+ lines. Architect reads less, understands more.
- Catches conceptual mistakes: "You're creating a circular dependency between services." This is invisible in code diff, obvious in design doc.
- Happens before effort invested: Correcting design takes minutes. Rewriting code takes hours.
- Higher signal-to-noise ratio: Design is semantic ("what and why"), code is syntactic ("how, with lots of boilerplate").
"Bring me some design documents, and let's have a talk about that. I don't want to look at every line. We can have a great meeting about the design documents before you work on it."
Architect reviewing a design doc asks:
- "Does this approach fit our architecture?"
- "Are you aware of the existing service that does X?"
- "This creates a dependency on Y—is that intentional?"
- "What happens when Z fails?"
- "Have you considered the performance implications at 10× scale?"
This review takes 15-30 minutes. Code review of the same feature would take 1-2 hours and catch fewer architectural issues.
What AI Can and Can't Do
Code review doesn't disappear. It shifts responsibility.
Division of Responsibility
✓ AI Can Do
- Check code against design — "Design says validate input. Code doesn't validate. Flag."
- Apply patterns from canon — "Use repository pattern. Code uses direct DB access. Flag."
- Catch obvious violations — "Don't log PII. This logs email address. Flag."
- Ensure consistency — "Naming convention is snake_case for DB fields. This uses camelCase. Flag."
✗ AI Can't Do (Architect Decides)
- Business sense — "Is this the right problem to solve?"
- Trade-off evaluation — "Performance vs complexity—which matters more here?"
- Architectural coherence — "Does this fit our long-term direction?"
- Novel judgment calls — "We've never done this pattern before—should we?"
The human architect focuses on what AI can't do: judgment, strategy, trade-offs, novel situations. AI handles routine enforcement, pattern application, consistency checking. This is the correct division of labor.
Onboarding Transformation
One of the most dramatic benefits of the canon system is what happens when a new team member joins. Traditional onboarding is a slow, painful process. Canon-based onboarding is fundamentally different.
Traditional Onboarding
New developer joins the team. Here's what happens:
- Week 1-2: Shadow other developers — Sit in on meetings, watch code reviews, absorb context passively
- Week 3-4: Small tickets — "Good first issues" that don't touch critical systems
- Week 5-8: Make mistakes — Discover the same gotchas everyone else discovered, get corrected in code review
- Week 9-12: Start to be productive — Finally internalized enough patterns to work semi-independently
- Month 4-6: Approach full productivity — But still missing deep context, still learning tribal knowledge
Timeline: 3-6 months to full productivity. Knowledge transfer is 1:1, slow, incomplete. Every new hire relearns the same lessons.
Canon-Based Onboarding
New developer joins the team. Here's what happens:
The New Developer Experience
Day 1: Repository Access
Clone repo. See /canon folder with team's full knowledge:
coding.md— How we structure code, patterns we useinfrastructure.md— Deployment, monitoring, operationssecurity.md— Tim's security expertisearchitecture.md— System boundaries, data flow, design principles
Day 2-3: Read and Load Context
Spend 2-3 hours reading canon files. Configure AI tool to load them automatically. Now every AI session has full team context.
Day 4-5: First Real Ticket
Not a "good first issue." A real feature. AI helps design following team patterns. Doesn't make rookie mistakes because canon prevents them.
Week 2-4: Productive
Already contributing meaningfully. AI applies team knowledge they haven't internalized yet. Code reviews focus on learning, not correcting violations of patterns they couldn't have known.
Timeline: 1-2 weeks to productive, 4-6 weeks to full speed.
The difference is staggering. A new developer inherits the team's entire accumulated knowledge on day one—not through osmosis over months, but explicitly through canon files that AI reads and applies.
The Meeting Reduction Effect
When knowledge lives in canon and AI applies it automatically, something interesting happens: a lot of meetings become unnecessary.
Meetings That Disappear
Training Sessions
Before: Tim gives security training to each new developer batch (2 hours, quarterly).
After: New developers read security.md (30 minutes, self-paced). Tim available for questions, but most are answered by canon.
"How Do We Do X?" Meetings
Before: Developer asks in Slack, spawns 30-minute discussion or meeting.
After: AI reads canon, answers question. "According to coding.md, we use the repository pattern for database access. Here's the template..."
Review Meetings for Routine Patterns
Before: Architect reviews every service design in a meeting.
After: AI conformance check validates design against architecture.md. Architect reviews async, only flags genuine issues.
Knowledge Transfer Meetings
Before: "Let me explain how billing works..." (1 hour meeting for each person who needs to know).
After: domain/billing.md contains the explanation. AI reads it. Developer gets context without meeting.
Research validates this. A study on LLM impact on team collaboration found:
"Weekly coordination meetings could be shortened by about 15 minutes on average, since the key points were captured by the AI and did not need to be reiterated multiple times."— ArXiv: Impact of LLMs on Team Collaboration
Meetings That Remain Valuable
Not all meetings go away. Some become more valuable because they're no longer wasted on routine knowledge transfer.
Design Reviews
Still human-to-human, but shorter and more focused. Canon handles routine patterns; meeting focuses on novel challenges and trade-offs.
Novel Architecture Decisions
Genuinely new territory requires human judgment. "Should we adopt event sourcing for this service?" Not in canon yet, needs discussion.
Canon Disagreements
When team needs to resolve conflicts in approach. These discussions improve canon quality through debate.
Strategic Planning
Future direction, roadmap, priorities. These need human creativity and buy-in, not just application of existing patterns.
The Mind-Meld Effect
There's a deeper effect that's harder to quantify but profoundly important: the team develops a shared mental model.
Traditional teams rely on implicit alignment—tribal knowledge, osmosis from meetings, slow convergence through working together. This is fragile. People interpret differently, remember differently, drift over time.
Canon-based teams have explicit alignment. The shared mental model lives in markdown files. When AI reads those files in every session, everyone's work is conditioned by the same context. Implicit becomes explicit. Tribal becomes canonical.
"The mind meld is the design document, to make sure everyone's in agreement of what's going on."
How the Mind-Meld Happens
-
→
All AI sessions use the same canon. Every developer's design work, code generation, problem-solving—all grounded in the same knowledge base.
-
→
All designs reference the same patterns. When someone says "use the repository pattern," everyone knows what that means because it's defined in
coding.md. -
→
All reviews use the same criteria. Code review isn't subjective. "Does this follow our patterns?" has a clear answer: check the canon.
-
→
Implicit alignment through shared infrastructure. You don't need to explicitly synchronize. Working through the same canon naturally aligns thinking.
The team "thinks together" even when working asynchronously. The canon is the shared brain, AI is the interface to it, and every team member's work reflects the collective intelligence.
Making It Work: Practical Considerations
This all sounds great in theory. How do you make it work in practice?
Getting Specialists to Write Canon
Why They Should
- • Scales their impact dramatically (1 day of writing = months of automatic application)
- • Reduces interruptions (fewer questions, fewer meetings)
- • Less repetitive code review (patterns applied automatically)
- • Their thinking lives on (even if they leave or move roles)
- • More time for strategic work (not just answering same questions)
How to Start
- • Identify top 5 things you repeat constantly
- • Write those as canon entries (doesn't need to be perfect)
- • See immediate reduction in questions
- • Build from there, iteratively
- • Accept that some knowledge will stay implicit (that's okay)
Getting Developers to Use Canon
Why They Should
- • Better AI outputs (more accurate, follows team patterns)
- • Fewer review comments (design already correct)
- • Faster approval (reviewers see familiar patterns)
- • Learning happens naturally (AI explains why patterns exist)
How to Ensure It
- • Make canon inclusion automatic (tooling loads it)
- • Part of Definition of Done (process enforcement)
- • Visible in design docs (reference canon explicitly)
- • Quality improvement is immediate (better AI = better work)
Getting Architects to Review Differently
Why They Should
- • More leverage per hour (review concepts, not syntax)
- • Less tedium (not hunting for missing null checks)
- • More strategic work (think about system evolution)
- • Better outcomes (catch problems at design stage)
How to Shift
- • Require design docs for significant work
- • Shift meeting time from code review to design review
- • Use conformance checks for code validation
- • Curate canon as part of role (not extra work)
Chapter Summary
TL;DR
- • Traditional knowledge transfer fails: Training, docs, meetings, code review—all push-based, synchronous, disconnected from work.
- • Canon solution: Specialists write expertise into markdown files (security.md, performance.md, etc.). AI reads them, applies automatically.
- • Tim's leverage: Writes security.md once; his thinking replayed thousands of times through AI. 10-100× impact per hour.
- • Specialist role shift: From answering questions / attending meetings to writing canon / deep consultation on novel challenges.
- • Architect bandwidth shift: From code review (60% time) to design review (30%) + strategic thinking (20%) + canon curation.
- • Onboarding transformation: 3-6 months to full productivity → 1-2 weeks. New members inherit entire team knowledge immediately.
- • Meeting reduction: Training, how-do-we questions, routine reviews become unnecessary. Valuable meetings become shorter, more focused.
- • Mind-meld effect: Shared mental model through shared canon. Team thinks together asynchronously.
- • Implementation keys: Make canon automatic (tooling), part of DoD (process), visible (design docs reference it).
Coming Up: Chapter 11
We've seen how specialists scale expertise and architects shift review burden. But there's a technical dimension we haven't fully explored: how do you manage what goes into AI context?
Canon files are powerful, but they're not free. Every markdown file you load consumes tokens. AI has a finite attention budget. How do you optimize context at team scale? That's the discipline of Context Engineering for Teams.
Context Engineering for Teams
Chapter Preview
You've learned to manage personal canon files and extract learnings. But there's a deeper discipline emerging—one that separates teams that get incrementally better outputs from teams that achieve exponential improvements.
This chapter introduces context engineering: the systematic practice of managing what information enters AI's attention at team scale. It's the natural progression beyond prompt engineering, and mastering it determines whether your team compounds capability or stays stuck at individual productivity levels.
"Prompt engineering is giving someone a task. Context engineering is ensuring they have everything they need—and nothing they don't—to accomplish it brilliantly."
Most teams think they're doing well with AI when they've learned to write better prompts. Clear instructions. Specific requirements. Good examples. This is prompt engineering, and it matters.
But prompt engineering is only half the equation. The other half—the one most teams miss entirely—is context engineering.
This chapter explains what that means, why it's critical, and how to implement it systematically across your team.
From Prompt Engineering to Context Engineering
Anthropic, the company behind Claude, recently described context engineering as "the natural progression of prompt engineering." That framing is exactly right.
Prompt engineering focuses on what you ask. Context engineering focuses on what the model has access to when it thinks.
The Distinction
Prompt Engineering
Focus: The immediate instruction
Question: How do I phrase this request?
Skill: Clear communication, specific requirements
Example: "Write a function that validates email addresses and returns true/false"
Context Engineering
Focus: The entire information environment
Question: What does the model need to know?
Skill: Information architecture, curation, management
Example: Canon files + design docs + relevant code + tool definitions
When teams master prompt engineering but ignore context engineering, they hit a plateau. Outputs improve slightly, then stall. The problem isn't the prompts—it's that the model is reasoning with incomplete, noisy, or contradictory information.
Context engineering fixes this. It's the discipline of architecting your AI agent's entire information ecosystem—not just the prompt, but everything the model has access to during reasoning.
The Finite Attention Budget Problem
Modern AI models have enormous context windows. Claude can process 200,000 tokens. GPT-5 handles 128,000. Gemini claims 1 million.
This creates a dangerous illusion: that you can load everything and let the model figure it out.
But here's what research consistently shows: more context doesn't equal better performance. Beyond a certain density, quality degrades even when you're nowhere near the token limit.
"Context must be treated as a finite resource with diminishing marginal returns. Like humans, who have limited working memory capacity, LLMs have an 'attention budget' that they draw on when parsing large volumes of context."— Anthropic, "Effective Context Engineering for AI Agents"
The Context Rot Phenomenon
Studies on "needle in a haystack" benchmarks reveal a troubling pattern: as the number of tokens in the context window increases, the model's ability to accurately recall information from that context decreases.
What this means in practice:
- • Information at position 500 in context is retrieved more reliably than information at position 50,000
- • More tokens = attention spread thinner across all content
- • Accuracy degrades as relevant signal drowns in noise
- • The problem isn't capacity—it's attention distribution
Token Efficiency: Why Density Matters
| Artifact Type | Typical Size | Signal Density | Context ROI |
|---|---|---|---|
| Raw codebase dump | 50,000+ tokens | Low (5-10%) | Poor |
| Conversation history | 20,000+ tokens | Medium (20-30%) | Variable |
| Design document | 2,000 tokens | High (70-80%) | Good |
| Team canon files | 500-1,000 tokens | Very High (90%+) | Excellent |
Canon files represent the highest ROI use of your context budget: minimal tokens, maximum signal.
The counterintuitive lesson: optimal density wins over maximum capacity. A well-curated 50,000-token context often outperforms a bloated 150,000-token context.
This is why context engineering matters. It's not about loading everything you have. It's about loading exactly what's needed, when it's needed, in the cleanest possible form.
Four Context Management Strategies
Anthropic's research on context engineering identifies four core strategies for managing AI attention budgets effectively. Each addresses a different aspect of the context problem.
Strategy 1: Just-In-Time Retrieval
Concept: Don't pre-load everything; keep references and load data dynamically when needed.
"Rather than pre-processing all relevant data up front, agents built with the 'just in time' approach maintain lightweight identifiers (file paths, stored queries, web links, etc.) and use these references to dynamically load data into context at runtime using tools."— Anthropic, "Effective Context Engineering for AI Agents"
Team Application:
- • Canon files: always loaded (small, high-value)
- • Codebase files: loaded via @-mention or tool when needed
- • Documentation: retrieved on demand, not pre-loaded
- • Historical context: summarized first, expanded only if necessary
Strategy 2: Compaction
Concept: When approaching context limits, summarize and restart with a clean state.
"Compaction is the practice of taking a conversation nearing the context window limit, summarizing its contents, and reinitiating a new context window with the summary."— Anthropic, "Effective Context Engineering for AI Agents"
Team Application:
- • Long coding sessions: periodically extract key decisions into summary
- • Multi-day projects: start each day with a "state summary" from previous day
- • Design iterations: capture current state, not full exploration history
- • After major milestones: summarize what was learned, restart with clean context
Strategy 3: Structured Note-Taking (Agentic Memory)
Concept: Agent writes persistent notes outside the context window, pulling them back in when relevant.
"Structured note-taking, or agentic memory, is a technique where the agent regularly writes notes persisted to memory outside of the context window. These notes get pulled back into the context window at later times."— Anthropic, "Effective Context Engineering for AI Agents"
Team Application:
- • Design docs: persistent notes about project architecture and decisions
- • Canon files: persistent team knowledge that survives sessions
- • Learning extractions: persistent notes about what went wrong and why
- • ADRs (Architecture Decision Records): persistent rationale for key choices
This is exactly what your three-layer canon achieves—persistent, structured memory external to any single AI session.
Strategy 4: Progressive Disclosure
Concept: Let the agent incrementally discover relevant context through exploration rather than front-loading everything.
"Progressive disclosure allows agents to incrementally discover relevant context through exploration. Each interaction yields context that informs the next decision: file sizes suggest complexity; naming conventions hint at purpose; timestamps can be a proxy for relevance."— Anthropic, "Effective Context Engineering for AI Agents"
Team Application:
- • Don't dump entire codebase—let AI explore based on task
- • Provide navigation clues in canon: "For database work, see infrastructure.md"
- • Use folder structure and naming conventions to guide discovery
- • Start with high-level overview, drill down only when needed
What Canon Does for Context
Your three-layer canon (personal → team → org) isn't just knowledge management. It's context engineering infrastructure.
Here's why canon files are uniquely valuable in the context budget:
Canon as High-ROI Context
Small Relative to Code
A team's entire canon might be 5,000 tokens. A single feature's codebase can easily exceed 50,000 tokens.
High Signal, No Syntax
Canon is all meaning, no boilerplate. Every token carries actionable knowledge. Code includes imports, type definitions, error handling—necessary but low-signal for many tasks.
Task-Relevant by Design
Canon files are distilled from actual work failures and discoveries. They contain exactly what developers wish they'd known before starting.
Earns Its Place Every Session
Unlike historical conversation logs or speculative documentation, canon is proven knowledge. It's been validated through use and refined through promotion.
Canon as Context Template
Beyond token efficiency, canon provides structural consistency:
-
→
Predictable structure: AI learns to expect coding.md, security.md, infrastructure.md in every project
-
→
Consistent format: Same template across all sessions reduces cognitive load
-
→
Team alignment: Everyone loads the same context architecture, not ad-hoc personal collections
-
→
Scalable learning: New canon entries automatically available to all team members in next session
"Canon isn't just documentation—it's the stable, high-density foundation of your team's context architecture. Everything else pages in and out, but canon stays constant."
Team-Level Context Practices
Individual developers can practice good context hygiene alone. But teams need shared practices to prevent context fragmentation and ensure everyone benefits from collective learning.
Standardized Context Inclusion
Every AI session should include:
Relevant personal canon
Your learned.md with personal discoveries and preferences
Team canon
coding.md, infrastructure.md, security.md—whatever applies to current work
Task-specific context
Design doc for current feature, relevant code files
Tool configuration
CLAUDE.md, .cursor/rules, or equivalent for your AI tool
Critical: Make this automatic, not manual. Configure tools to auto-include canon files rather than relying on developers to remember.
Context Hygiene
✓ Good Practices
- • Keep canon files focused—one clear purpose per file
- • Prune regularly—remove outdated entries quarterly
- • Structure for skimmability—AI will scan, not read deeply
- • Use clear section headings and bullet points
- • Include "when this applies" context in entries
- • Link related entries across files
✗ Bad Practices
- • Dumping everything "just in case"
- • Neglecting updates—stale context is worse than no context
- • Inconsistent structure across team files
- • Mixing high-level principles with low-level details in same file
- • No attribution or dates—can't assess relevance
- • Duplicate information across multiple files
Tool-Specific Implementations
Context engineering principles remain constant, but implementation varies by tool. Here's how to apply these practices with popular AI coding assistants.
Claude Code
Context Mechanisms:
- • CLAUDE.md: Auto-included in every session
- • @-mention: Include specific files on demand
- • Plan mode: Design without code keeps context focused on architecture
- • Conversation history: Compacts automatically over time
Team Practice:
- • Maintain shared CLAUDE.md in project root with team canon
- • @-mention personal learned.md when needed
- • Use plan mode for design discussions to avoid code bloat
- • Start fresh sessions after major milestones to reset context
Cursor
Context Mechanisms:
- • .cursor/rules: Project-level context file
- • @-mention: Reference specific files
- • Codebase indexing: Progressive disclosure of code structure
- • Multi-file awareness: Can track related files automatically
Team Practice:
- • Maintain .cursor/rules with team canon content
- • Per-developer custom rules for personal preferences
- • Leverage codebase index—let Cursor discover, don't pre-load everything
- • Use @-mention for design docs and ADRs
GitHub Copilot
Context Mechanisms:
- • Workspace context: Automatic based on open files
- • Comments and docstrings: Inline context for suggestions
- • Less explicit control than Claude Code/Cursor
- • Relies more on implicit context from codebase
Team Practice:
- • Invest heavily in inline documentation—it becomes context
- • Use comments to inject canon knowledge when working in specific areas
- • Keep canon files open in editor when working on related code
- • Paste canon excerpts into code comments for critical sections
Diagnosing Context Problems
When AI outputs degrade, the first place to look isn't the model or the prompt—it's the context. Here's how to diagnose common context issues:
Symptom: AI keeps making the same mistakes
Diagnosis: Relevant canon isn't being included in context
Fix: Verify automatic inclusion of canon files. Check tool configuration (CLAUDE.md, .cursor/rules). Confirm canon contains the pattern you expect.
Symptom: AI output ignores team conventions
Diagnosis: Conventions aren't documented in canon, or canon isn't loaded
Fix: Document conventions in coding.md. Verify file is included in context. Check for conflicting information in context that might override.
Symptom: AI seems "dumb" compared to individual use
Diagnosis: Team context may be bloated, contradictory, or too generic
Fix: Audit team canon for noise. Remove outdated entries. Check for contradictions. Ensure specificity—vague advice is worse than no advice.
Symptom: Long sessions degrade in quality
Diagnosis: Context rot from accumulated conversation history
Fix: Apply compaction strategy: summarize key decisions, start fresh session with state summary. Avoid letting conversations exceed 50-60% of context window.
The Context Audit
Periodically review your context architecture:
What's actually in the context?
Check what files are loaded. Measure total token count. Identify largest contributors.
Is it all relevant to the task?
Estimate how much is actually task-applicable vs "might be useful someday."
Is anything missing that should be there?
Compare to recent error patterns. Are learnings making it into canon?
Is there noise that should be removed?
Check for duplicate information, outdated guidance, contradictions.
Think of this like code review, but for context. Most problems trace to context quality, not model capability.
Context Budgeting
Just as you budget CPU cycles or memory allocation in performance-critical systems, context budgeting allocates your finite attention resource deliberately.
Example: 200K Token Context Budget
Layer 1: Always (5-10K tokens)
Contents: Core team conventions, current project context, personal learned.md
Rationale: Minimal tokens, maximum signal. Always worth the cost.
Examples: coding.md (2K), security.md (1K), current design doc (3K), learned.md (1K)
Layer 2: Task-Relevant (10-30K tokens)
Contents: Domain-specific canon, relevant design docs, specific code files
Rationale: Load based on what task requires. Drop after completion.
Examples: infrastructure.md for deployment work (3K), related ADRs (5K), code files being modified (15K)
Layer 3: Reference (retrieved on demand)
Contents: Full codebase, historical designs, external documentation
Rationale: Don't pre-load. Retrieve and summarize only when explicitly needed.
Examples: Legacy module documentation, related features for comparison, API documentation
Conversation + Output Buffer (remaining ~140K)
Contents: Active conversation history, model's working memory for current task
Rationale: Leave headroom. Never max out context—quality degrades near limits.
Rule: Stay under 50-60% total capacity for sustained high performance
Note: This is an approximate guide. Adjust based on experience with your specific workflows and tools.
Context Engineering as Discipline
Context engineering isn't a technique—it's a way of thinking. Every AI interaction is shaped by what information is available when the model reasons. Context quality determines the ceiling of what's possible.
This is now a core engineering skill, not an optional extra. The teams that master it will see dramatically better outputs from the same models everyone else has access to.
"Context engineering is to AI what architecture is to systems. It's the discipline of structuring information for optimal AI reasoning."
How Context Engineering Connects
Canon (Ch 3): The persistent context you maintain
Your three-layer knowledge hierarchy is structured note-taking made systematic
Design docs (Ch 4-5): High-value context for specific work
Design-first workflow ensures context is semantic, not syntactic
Learning extraction (Ch 6-7): Improving context over time
Every extracted learning refines what enters future context windows
Promotion (Ch 8): Scaling context across team/org
Knowledge promotion ensures best context available to everyone
Flywheel (Ch 9): Better context × better model = compounding
Well-engineered context multiplies the value of every model upgrade
The entire system we've been building—canon, design docs, learning extraction, knowledge promotion—is context engineering infrastructure. Each piece optimizes a different aspect of what enters AI's attention and how.
Teams that understand this now will adapt better as tools evolve. The principles remain constant even as capabilities expand:
- High signal, minimal noise — Always
- Right information, right time — Context budgeting
- Persistent, structured knowledge — Canon as foundation
- Just-in-time retrieval — Load what's needed when needed
TL;DR: Chapter 11 Summary
- • Context engineering > prompt engineering: Not just what you ask, but what information is available when AI reasons. Anthropic calls it "the natural progression of prompt engineering."
- • Finite attention budget: AI has limited "working memory." More tokens ≠ better results. Context rot degrades accuracy as noise increases.
- • Four strategies: Just-in-time retrieval, compaction, structured note-taking (canon), progressive disclosure. Apply all four for optimal context management.
- • Canon as high-ROI context: Small token cost, high signal density, task-relevant by design. Canon files earn their place in every session.
- • Team practices: Standardized inclusion (automatic canon loading), context hygiene (regular pruning), tool configuration (CLAUDE.md, .cursor/rules)
- • Diagnose context first: When outputs degrade, check context before blaming model or prompt. Most problems trace to context quality.
- • Context budgeting: Allocate deliberate layers (always/task-relevant/reference). Stay under 50-60% capacity for sustained quality.
- • Core engineering skill: Context engineering is now essential, not optional. Teams that master it get dramatically better outputs from same models.
What's Next: Your First Week
We've covered the theory: canon architecture, design documents, learning extraction, knowledge promotion, context engineering. The complete system for compounding team learning with AI.
But theory without action is just interesting reading. Chapter 12 provides the concrete, step-by-step guide: what do you do on Monday? How do you start building this infrastructure today, with minimal investment, and scale it as you learn?
Your First Week
Chapter Preview
We've covered a lot: three-layer canon, design documents as primary artifacts, Definition of Done v2.0, learning extraction, knowledge promotion, the model upgrade flywheel. Where do you actually start?
This chapter gives you the concrete, actionable roadmap: your first week, step-by-step; the 90-day timeline; what to measure; common challenges and how to overcome them; and the invitation to create the first file today.
"The difference between knowing you should build team learning infrastructure and actually building it is creating one file today."
You've spent eleven chapters understanding how teams can build compounding learning systems around frozen AI models. You understand the architecture. You see the mechanism. You know why it matters.
Now the question becomes: where do you start?
The good news: you don't need to implement everything at once. The minimal viable starting point is small, powerful, and takes less than an hour to set up. By the end of this chapter, you'll have a clear plan for your first week, a 90-day roadmap, and the confidence to begin.
The Minimal Viable Start
Most implementations fail because they're too ambitious. Teams try to redesign their entire workflow, migrate all documentation, train everyone simultaneously. The weight of the effort crushes momentum before it builds.
The alternative: start with the smallest meaningful implementation. One file. One week.
The Core Pattern
Create one shared canon file: coding.md
Put it in your team's repository: Where everyone can access it
Include it in AI sessions: Make it part of your context
Update it when something goes wrong: Capture learnings immediately
That's it. No redesign of processes. No new tools. No complex workflow changes. One file.
Why This Works
- Low barrier to entry: Anyone can create a markdown file. No special tools, no learning curve, no permission needed.
- Immediate value: Your AI outputs improve the first time you include it in context.
- Natural momentum: Using it reveals what else should be in it. The file grows organically.
- Proof of concept: Demonstrates value before larger investment. You'll see results within days.
Within your first week, you'll notice AI suggestions that align better with your team's conventions. Within a month, you'll have dozens of learnings captured. Within a quarter, new team members will onboard faster because the knowledge is externalized.
Week 1: Your Step-by-Step Guide
Here's exactly what to do, day by day. Each step takes 15-30 minutes. By Friday, you'll have a working team learning system.
Day 1: Create the File
Action: Create coding.md in your team's repository.
Seed content (5-10 items):
- • Stack: languages, frameworks, versions you use
- • Conventions: naming patterns, code organization, error handling
- • "We use X, not Y" decisions
- • Common gotchas: things that trip people up
Time required: 15-20 minutes
Day 2: Start Using It
Action: Include coding.md in your AI sessions.
How to include it:
- • Claude Code: Add to
CLAUDE.mdor @-mention the file - • Cursor: Add to
.cursor/rules - • Other tools: Paste at start of session or include as context
Observe:
- • Does AI follow the patterns you documented?
- • What's missing that you had to correct?
- • Note what should be added
Time required: 5 minutes to set up, ongoing observation
Days 3-4: Refine Based on Use
Action: Add things you noticed were missing.
- • Every time you correct AI about a convention → add it to the file
- • Every time you explain context → consider adding it
- • Keep the file focused but growing organically
- • Don't overthink the phrasing—rough notes are better than no notes
Example additions you might make:
- • "The payment service returns 200 even on failures—always check the response body"
- • "Database migrations require manual approval—ping @database-team in Slack"
- • "Test files go in
__tests__, nottest/"
Time required: 10-15 minutes per day
Day 5: Share with Team
Action: Tell your team what you've done and invite them to try it.
- • Show the file and explain what's in it
- • Demonstrate how to include it in AI sessions
- • Share one concrete example where it improved AI output
- • Invite additions via pull request
Keep it lightweight:
"I've been capturing our team conventions in coding.md. When I include it in AI sessions, outputs match our patterns better. Try it out, and feel free to add anything I've missed."
Time required: 15 minutes (quick demo + Slack message)
What Happens in Week 2-4
Week 2: The file grows organically as you and early adopters add to it. You notice AI outputs are more aligned with team patterns. Teammates start using it after seeing your results.
Week 3: The file becomes valuable reference. When someone asks "how do we handle auth?" you point to coding.md. Knowledge that was tribal is now explicit.
Week 4: The team notices the difference. New members onboard faster. AI sessions produce better first-draft code. You start considering what else should be in team canon.
The 90-Day Roadmap
Week 1 establishes the foundation. The next three months build systematically on that start, introducing new practices as the team gains confidence. Here's the timeline:
Foundation
Goals:
- •
coding.mdestablished and in use - • Personal
learned.mdfor key members - • Basic tooling configured
- • Team aware of the approach
Success criteria:
- ✓ File exists and is being updated
- ✓ At least 3 team members using it
- ✓ 10+ entries in
coding.md
Design-First Experimentation
Goals:
- • Design docs for significant work (encouraged, not mandated)
- • Expand canon (add
infrastructure.md,security.md) - • Learning extraction becoming habit
- • First canon PRs from multiple people
Success criteria:
- ✓ 3+ features developed with design docs
- ✓ Canon has 30+ entries across files
- ✓ Multiple contributors to canon
Process Integration
Goals:
- • DoD v2.0 formally adopted
- • Design review before code for substantial work
- • Learning extraction in PR template
- • Knowledge promotion process established
Success criteria:
- ✓ DoD checklist updated
- ✓ Design review is normal practice
- ✓ Canon PRs are routine
- ✓ Team notices AI output improvement
Beyond 90 Days: Ongoing Practice
- Canon gardening: Regular pruning and refining. Remove outdated entries, merge duplicates, clarify ambiguous items.
- Knowledge promotion flow: Systematic movement of learnings from personal → team → org layers.
- Waiting for the flywheel: The next model upgrade (3-6 months) is when you'll really feel the compounding effect.
- Expanding scope: If you're seeing value, consider org-level canon or expanding to other teams.
What to Measure
You can't improve what you don't measure. But measuring everything creates noise. Focus on a few key indicators that tell you whether this is working.
Leading Indicators: Early Signs of Success
Track these to see if you're on the right path (Week 2-4):
Canon Health
- • Number of entries in canon files
- • Frequency of updates (weekly is healthy)
- • Number of contributors
Adoption
- • Team members using canon in AI sessions
- • Design docs being written
- • Learning extractions being captured
Process
- • DoD items being completed
- • Canon PRs being submitted
- • Design reviews happening
Lagging Indicators: Actual Impact
Track these to measure results (Month 2-3):
Quality
- • Reduction in repeated mistakes
- • Improvement in first-pass AI output quality
- • Fewer code review comments on patterns/conventions
Efficiency
- • Time to onboard new team members
- • Time spent in routine coordination meetings
- • Speed of design review vs code review
Compounding
- • Output quality jump after model upgrade
- • Ramp-up speed for new projects
- • Knowledge retrieval speed ("it's in the canon")
Common Early Challenges (And How to Fix Them)
You'll hit obstacles. Everyone does. Here are the most common challenges in the first 90 days and proven solutions.
Challenge 1: "Nobody's Using It"
Symptom:
You created the file, but the team isn't adopting it.
Diagnosis:
- • Tooling friction (hard to include in their workflow)
- • Not visible enough (they forgot it exists)
- • Not enough value demonstrated (skepticism)
Fix:
- • Automate inclusion: Add to
CLAUDE.md,.cursor/rules, or similar so it's always loaded - • Show concrete examples: In a team meeting, demonstrate before/after AI outputs
- • Pair with someone: Sit with a teammate and show the difference live
Challenge 2: "It's Getting Messy"
Symptom:
Canon file is growing but feels disorganized.
Diagnosis:
Natural early stage; needs structure as it scales.
Fix:
- • Add clear section headings: Organize by category (Stack, Conventions, Gotchas, etc.)
- • Split into multiple files: If over 200 lines, consider
infrastructure.md,security.md - • Remove outdated entries: Mark deprecated items, delete if no longer relevant
- • This is canon gardening: Normal maintenance, schedule quarterly reviews
Challenge 3: "We Don't Have Time"
Symptom:
Team says they're too busy to write design docs or extract learnings.
Diagnosis:
Perceived as extra work, not as time-saving investment.
Fix:
- • Start smaller: Just the three questions (5 minutes), not elaborate documentation
- • Show time saved: Calculate time spent on repeated mistakes vs time spent capturing learnings
- • Demonstrate better AI output: "This took 2 minutes instead of 20 because the canon had it"
- • Make it part of DoD: Not optional extra work, but required for completion
Challenge 4: "We Already Have Documentation"
Symptom:
Team points to existing wiki/docs as sufficient.
Diagnosis:
Misunderstanding the difference between docs and canon.
Fix:
- • Explain the difference: Canon is actively used as AI context, not passive reference material
- • Show the contrast: AI outputs with vs without canon—concrete demonstration
- • Canon is input: It shapes what AI generates. Docs are output—they describe what was built.
- • Both have value: Canon doesn't replace docs; it serves a different purpose
When You'll Feel the Flywheel
Building this infrastructure is an investment. The returns aren't immediate—but they are exponential. Here's what the journey feels like:
Months 1-3: The Accumulation Phase
You're building the infrastructure. Investment feels higher than return. "Is this worth it?" moments are normal.
What's happening:
- • Canon files growing with team knowledge
- • Habits forming around learning extraction
- • Design-first workflow becoming familiar
- • Trust the process—you're laying foundation
Months 3-6: The Traction Phase
Canon is substantial. AI outputs are noticeably better. Team adoption is widespread. It feels normal, not extra.
What's happening:
- • AI generates code that matches team patterns on first try
- • New team members onboard in days, not weeks
- • Design reviews catch issues that would have been costly bugs
- • The system feels like part of how you work
First Model Upgrade: The Compounding Phase
This is when you feel it. Same workflows, dramatically better results. The flywheel kicks in.
What's happening:
- • New model executing against mature scaffolding = multiplicative improvement
- • Teams without scaffolding get linear gains; you get exponential
- • Your output quality jumps while theirs stays flat
- • "This is why we built the infrastructure"
"When Claude Opus 4.5 dropped, teams with scaffolding got multiplicative gains. Teams without started from zero. Build the scaffolding now. Harvest the compounding later."
The Core Insight
The model is frozen. The scaffolding learns.
You've just spent eleven chapters understanding what that means and how to act on it. The difference between knowing and doing is creating the first file.
Everything in this book—the three-layer canon, design documents as primary artifacts, Definition of Done v2.0, learning extraction, knowledge promotion, the model upgrade flywheel—it all starts with one simple action.
The Invitation
Start this week:
- • Create
coding.md - • Add 5-10 things you know
- • Include it in your next AI session
- • Notice what changes
That's the beginning.
What You're Building
Not just better AI outputs—though you'll get those.
You're building:
- A team that compounds its learning instead of resetting every project
- An organization that doesn't lose knowledge when people leave or context switches
- A system that benefits from every model upgrade instead of starting from zero
- Infrastructure for the future of work where AI is a capable executor and teams are the learning substrate
The teams that build this infrastructure in 2025 will be the ones pulling ahead in 2026, 2027, and beyond. Not because they have better AI—everyone will have access to similar models. But because they've built the scaffolding that makes every model upgrade multiply their capability.
The Next Step
Close this book. Open your editor. Create the file.
The scaffolding starts now.
Appendix: Starter Templates
Copy these templates to get started immediately. Customize for your team's context.
Template 1: Starter coding.md
- [Language/Version]
- [Framework/Version]
- [Database]
- [Cloud/Infrastructure]
## Conventions
- [Naming conventions]
- [Code organization]
- [Error handling approach]
## We Use / Don't Use
- ✓ [Thing we use and why]
- ✗ [Thing we don't use and why]
## Gotchas
- [Known issue 1]
- [Known issue 2]
## Patterns
- [Pattern we follow and example]
Template 2: Starter learned.md (Personal)
- Date: [current period, e.g., "Late 2025"]
- Current models: [what you're using, e.g., "Claude Opus 4.5, GPT-5"]
- Current project: [focus area]
## Things That Keep Going Wrong
- [Repeated issue 1]
- [Repeated issue 2]
## My Preferences
- [How I like to work with AI]
- [Patterns that work well for me]
## Project-Specific Notes
- [Context for current work]
Template 3: Learning Extraction Checklist
**Sticking Points** (Where did reality differ from the plan?):
- [What took longer than expected]
- [Where I got stuck]
**Surprises** (What did I discover that wasn't anticipated?):
- [Edge case found]
- [Behaviour that surprised me]
**Never Do This Again** (What approach should we avoid?):
- [Anti-pattern discovered]
- [Time-wasting approach]
**Canon Update**:
- [ ] Added to learned.md
- [ ] PRed to coding.md
- [ ] Flagged for team/org promotion
Chapter Summary
- • Minimal viable start: One shared
coding.mdfile, one week - • Week 1 actions: Create (Day 1), use (Day 2), refine (Days 3-4), share (Day 5)
- • 90-day roadmap: Month 1 (foundation) → Month 2 (design-first experimentation) → Month 3 (process integration)
- • What to measure: Canon health, adoption, quality improvements, efficiency gains, compounding effects
- • Common challenges: Low adoption (automate inclusion), messiness (structure as you grow), perceived overhead (show time savings), confusion with docs (explain the difference)
- • The flywheel timing: Accumulation (Months 1-3) → Traction (Months 3-6) → Compounding (first model upgrade)
- • What you're building: Not just better AI outputs, but a team that compounds learning, an org that retains knowledge, and infrastructure that benefits from every model upgrade
- • The invitation: Create the file today
References & Sources
This ebook synthesises insights from practitioner experience, academic research, industry analysis, and enterprise AI transformation consulting. Sources are organised thematically below with full URLs for further exploration.
Primary Research: Anthropic
Effective Context Engineering for AI Agents
Core framework for context engineering principles, attention budget concepts, and context management strategies including just-in-time retrieval, compaction, and progressive disclosure.
https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
Introducing Claude 4
Claude Opus 4 capabilities and benchmarks, including SWE-bench (72.5%) and Terminal-bench (43.2%) performance metrics.
https://www.anthropic.com/news/claude-4
Estimating AI Productivity Gains
Research on AI's potential impact on US labour productivity, estimating 1.8% annual increase with universal adoption.
https://www.anthropic.com/research/estimating-productivity-gains
Academic & Industry Research
Impact of LLMs on Team Collaboration (ArXiv)
Research on LLM integration in software development teams, measuring meeting time reduction, documentation quality improvements, and cross-functional collaboration effects.
https://arxiv.org/html/2510.08612v1
Three Approaches to Organizational Learning
Academic foundation on knowledge-producing and knowledge-utilising systems in human organisations.
https://home.snu.edu/~jsmith/library/body/v16.pdf
Organizational Knowledge Management Structure
Framework for knowledge team communities and knowledge-oriented problem solving in organisations.
https://realkm.com/wp-content/uploads/2024/01/Organizational_knowledge_management_structure.pdf
The End of Scale (MIT Sloan Management Review)
Analysis of how AI is transforming economies of scale and organisational structures.
https://sloanreview.mit.edu/article/the-end-of-scale/
Development Tools & Practices
Stack Overflow: Beyond Code Generation
How AI is changing team dynamics, including trends toward smaller, more agile teams and documentation of organisational knowledge.
https://stackoverflow.blog/2025/10/06/beyond-code-generation-how-ai-is-changing-tech-teams-dynamics/
Dev.to: Rethinking Team Development in the Age of LLMs
Practitioner insights on documenting implicit rules for LLM consumption and immediate quality improvements.
https://dev.to/shinpr/rethinking-team-development-in-the-age-of-llms-5cml
GitHub Copilot vs Cursor vs Tabnine (GetDX)
Comparative analysis of AI coding assistants, adoption rates, and productivity measurements.
https://getdx.com/blog/compare-copilot-cursor-tabnine/
Cursor vs Copilot vs Clark (Superblocks)
Analysis of AI coding tools' ability to index repositories and enable codebase Q&A at scale.
https://www.superblocks.com/blog/cursor-vs-copilot
AI Productivity Tools for Coders (SideTool)
Microsoft research findings: 90% of developers feel more productive with AI tools; GitHub Copilot users completed tasks 55.8% faster.
https://www.sidetool.co/post/ai-productivity-tools-for-coders-making-development-faster-and-easier/
Architecture & Documentation
AWS: Architecture Decision Records Process
Best practices for ADRs: capturing architectural decisions, context, and consequences in version-controlled documents.
https://docs.aws.amazon.com/prescriptive-guidance/latest/architectural-decision-records/adr-process.html
Microsoft Azure: Architecture Decision Record
ADR as a critical deliverable for solution architects, providing context-specific justifications.
https://learn.microsoft.com/en-us/azure/well-architected/architect-role/architecture-decision-record
Red Hat: Architecture Design Review
Maintaining consistent, documented history of solution designs for architectural understanding.
https://www.redhat.com/en/blog/architecture-design-review
Apidog: Code-First vs Design-First API Workflows
Design-first approach: designing API contract before implementation code.
https://apidog.com/blog/code-first-vs-design-first-api-doc-workflows/
FullScale: Documentation-First Approach
Benefits of documenting systems before implementation: clearer thinking, natural checkpoints, reduced rework.
https://fullscale.io/blog/documentation-first-approach/
AI & Machine Learning
B-Eye: RAG vs Fine Tuning
Comparison of retrieval-augmented generation and fine-tuning approaches for domain-specific AI applications.
https://b-eye.com/blog/rag-vs-fine-tuning/
Monte Carlo Data: RAG vs Fine Tuning
Fundamental differences in how RAG and fine-tuning handle knowledge currency and updates.
https://www.montecarlodata.com/blog-rag-vs-fine-tuning/
Intuition Labs: Anthropic Claude 4 Evolution
Claude 4 capabilities: extended reasoning, tool-use plugins, and working memory enhancements.
https://intuitionlabs.ai/articles/anthropic-claude-4-llm-evolution
Faros AI: Context Engineering for Developers
Practical guide to context engineering as the discipline of architecting AI's information ecosystem.
https://www.faros.ai/blog/context-engineering-for-developers
Process & Methodology
Scrum.org: What is Definition of Done
Formal description of the state of the Increment when it meets quality measures required for the product.
https://www.scrum.org/resources/what-definition-done
Lean Wisdom: Definition of Done and Acceptance Criteria
Effective DoD characteristics: measurable, verifiable criteria agreed upon by entire team.
https://www.leanwisdom.com/blog/definition-of-done-and-acceptance-criteria/
Gibion AI: Workflow Version Control
AI-powered workflow version control: 60-85% documentation time reduction reported.
https://gibion.ai/blog/workflow-version-control-how-ai-manages-process-updates
Secoda: Versioning ML Applications
Best practices for version control of machine learning models, parameters, and artifacts.
https://www.secoda.co/learn/best-practices-for-versioning-documenting-and-certifying-ai-ml-applications
Index.dev: Version Control Workflow Performance
Statistics: 75% of businesses report productivity improvements; 72% of developers report 30% reduction in development timelines.
https://www.index.dev/blog/version-control-workflow-performance
Knowledge Management
LinkedIn: Knowledge Management in AI-Enhanced Organisations
AI-enhanced knowledge management capabilities: RAG, memory, contextual awareness, and agentic workflows.
https://www.linkedin.com/pulse/knowledge-management-ai-enhanced-organizations-jerry-kurian-xssmc
SiftHub: Enterprise Knowledge Management
Enterprise knowledge architecture: centralised bases, document collaboration, AI-powered search.
https://www.sifthub.io/blog/enterprise-knowledge-management
Relevance AI: Document Version Control AI Agents
AI agents evolving from passive tracking to proactive digital teammates with network effects.
https://relevanceai.com/agent-templates-tasks/document-version-control
Prompts.ai: Team Prompt Sharing
Prompt engineering as a scalable operational advantage through documentation and sharing.
https://www.prompts.ai/ar/blog/strong-ai-platforms-team-prompt-sharing-collaborative-management
Context Engineering AI: Practical Guide
Designing shared knowledge bases, version control for prompts, and quality guardrails.
https://contextengineering.ai/blog/ai-prompt-engineering/
LeverageAI / Scott Farrell
Practitioner frameworks and interpretive analysis developed through enterprise AI transformation consulting. These sources informed the conceptual frameworks presented throughout this ebook.
The Team of One: Why AI Enables Individuals to Outpace Organisations
Source for "bacteria vs sedimentary rock" learning speed concept, individual vs organisational iteration cycles, and the Markdown Operating System framework.
https://leverageai.com.au/the-team-of-one-why-ai-enables-individuals-to-outpace-organizations/
The AI Learning Flywheel
Four-stage compounding loop framework: Exposure → Critical Engagement → Self-Awareness → Co-Evolution. Memory hygiene and context management principles.
https://leverageai.com.au/wp-content/media/The_AI_Learning_Flywheel_ebook.html
Stop Automating. Start Replacing.
Legacy IT as "tuition already paid" concept; learning as the asset, not the software.
https://leverageai.com.au/wp-content/media/Stop_Automating_Start_Replacing_ebook.html
Context Engineering: Why Building AI Agents Feels Like Programming on a VIC-20
Context as finite resource, clean context compounds, context rot patterns, and three-tier architecture concepts.
https://leverageai.com.au/wp-content/media/context_engineering_why_building_ai_agents_feels_like_programming_on_a_vic_20_again_ebook.html
The Hidden Architecture of Better AI Reasoning
Pre-Thinking Prompting (PTP) framework: separating problem understanding from problem solving.
https://leverageai.com.au/wp-content/media/The_Hidden_Architecture_of_Better_AI_Reasoning.html
SiloOS: The Markdown Operating System
Four pillars framework: folders, markdown, Python, scheduling. Instructions.md as agent personality and policy layer.
https://leverageai.com.au/wp-content/media/SiloOS.html
Note on Research Methodology
This ebook was compiled in late 2025. Research was gathered through a combination of:
- • Primary sources: Direct engagement with AI coding tools (Claude Code, Cursor) and documentation of emerging patterns
- • Academic research: Peer-reviewed studies on LLM integration in software development teams
- • Industry analysis: Reports from major consulting firms and technology research organisations
- • Practitioner insights: Frameworks developed through enterprise AI transformation consulting
Some URLs may require subscription access or may have been updated since compilation. The concepts and frameworks presented represent the state of practice as of late 2025.