Waterfall Per Increment: How Agentic Coding Changes Everything

SF Scott Farrell January 27, 2026 scott@leverageai.com.au LinkedIn

Waterfall Per Increment: How Agentic Coding Changes Everything

Why your AI investment isn’t paying off — and what to restructure now

Andrej Karpathy — ex-Tesla AI Director, OpenAI founding member, the guy who coined “vibe coding” — now says he’s never felt more behind as a programmer.1

“Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession.”
— Andrej Karpathy, December 2025

Theo Brown — ex-Twitch engineer, YC founder — generates 11,900 lines of production code without opening an IDE.2 Neither of them looks at code anymore. This isn’t laziness — it’s the future arriving. And it’s arriving faster than anyone expected.

But here’s what most engineering leaders miss: these elite practitioners aren’t just coding faster. They’ve restructured how they work entirely. And that restructuring is why their investment pays off while yours doesn’t.

66%
of developers spend MORE time fixing AI code than they save writing it3

That statistic should stop you cold. The majority of developers using AI tools are going backwards. Not because AI is bad at coding — it’s remarkably good. But because organizations are optimizing the wrong constraint.

The Constraint Shifted. Your SDLC Didn’t.

For forty years, the bottleneck in software development was implementation. Typing code. Debugging code. Refactoring code. Every process improvement — from Waterfall to Agile — was designed to manage the expensive, error-prone reality of humans translating ideas into working software.

Agile won because it acknowledged that requirements discovery is iterative. Users don’t know what they want until they see it. So we ship small increments, gather feedback, and iterate. The code is the artifact we manage, version, review, and cherish.

Then agentic coding arrived, and implementation became cheap.

“GPT-3.5 alone achieves 48% accuracy on coding benchmarks. Wrapped in an agentic workflow, the same model reaches 95%. The improvement from GPT-3.5 to GPT-4 is dwarfed by incorporating iterative agent workflows.”
— Andrew Ng, on agentic workflow research4

Read that again. A weaker model with good architecture doubles the performance of a stronger model used naively. Architecture beats capability. Process beats power.

This is the insight most engineering leaders miss. They buy better AI tools. They give everyone Cursor or Copilot subscriptions. They measure “lines of code generated” and feel good about adoption metrics. Meanwhile, their teams are 19% slower than before.5

The Surgery Problem

Here’s where Karpathy’s behavior becomes instructive. When building Nanochat — an 8,000-line production system — he hand-wrote most of it. His explanation:

“It’s basically entirely hand-written (with tab autocomplete). I tried to use claude/codex agents a few times but they just didn’t work well enough at all and net unhelpful.”
— Andrej Karpathy, on Nanochat development6

The inventor of “vibe coding” hand-writes production code. Why?

Because AI excels at greenfield generation but struggles with surgery on existing code. When you ask AI to modify existing systems — integrate with complex codebases, refactor without breaking interfaces, add features to established architectures — it “misses bits” and “tacks on little pieces” without redesigning the whole.

This is why 65% of developers report that AI “misses relevant context during critical tasks like refactoring, writing tests, or reviewing code.”7 The AI is brilliant at first drafts. It’s mediocre at editing.

The implication is counterintuitive: instead of patching AI-generated code, regenerate from better specifications. Delete and rebuild beats edit and accumulate.

Waterfall Per Increment: The New Model

This brings us to the restructuring that actually works.

Agile’s core insight — that requirements emerge through iteration — remains true. Users still don’t know what they want until they see it. You still need feedback loops. You still ship increments.

But inside each increment, the economics have inverted. When code generation is cheap and code surgery is expensive, you want:

  • Detailed specifications upfront — because AI’s first-pass quality depends entirely on spec clarity
  • Ruthless evaluation harnesses — because you verify behavior, not lines of code
  • Regeneration over patching — because fresh generation beats accumulated patches

This looks like waterfall inside each slice: detailed design, then generation, then verification. But it remains agile across slices: iterate on requirements, learn from users, adjust direction.

Old Model (Code as Asset)

  • Sprint: rough spec → implement → test → refactor
  • Invest in code quality
  • Review lines of code
  • Patch to improve
  • Bottleneck: implementation speed

New Model (Spec as Asset)

  • Slice: detailed spec → generate → verify → regenerate
  • Invest in spec quality
  • Review diffs against specs
  • Regenerate to improve
  • Bottleneck: specification clarity

The Evidence Is In Your Own Data

If you’ve adopted AI coding tools, you already have the evidence. Look at these patterns:

Pattern 1: Time-to-first-working-version decreased. Time-to-production increased.

AI generates prototypes fast. But integrating those prototypes into existing systems, making them production-ready, and debugging the “almost right” outputs takes longer than before. This is the surgery problem at scale.

Pattern 2: Code churn doubled.

GitClear analyzed 153 million lines of code and found that code churn — lines reverted or updated within two weeks — has doubled since 2021. Code duplication increased 4x.8 AI is generating code that doesn’t stick.

Pattern 3: Senior developers are skeptical. Junior developers are excited.

Only 29% of developers now trust AI outputs, down from 40% one year ago.9 The drop in trust correlates with experience. Seniors have seen the surgery problem; juniors are still impressed by first-pass generation.

Pattern 4: Your fastest teams have better specs, not better prompts.

If you look at where AI actually accelerates delivery, you’ll find a common factor: clear specifications and executable acceptance criteria. The teams struggling have vague requirements and rely on iteration to discover what they want.

What Leaders Actually Need to Change

The restructuring isn’t complicated. It’s just different from what you’ve been doing.

1. Invest in specification quality, not prompting tricks

The bottleneck is now “knowing what to build” not “building it.” This means your highest-leverage investment is specification infrastructure: templates, examples, acceptance criteria, and the discipline to write them before generating code.

2. Build evaluation harnesses first

Before any generation happens, define how you’ll know it worked. Automated tests, synthetic user journeys, security checks, performance budgets. These aren’t nice-to-haves; they’re the steering wheel. Without them, you can’t tell good generation from bad, and you can’t regenerate with confidence.

3. Treat code as compiled output

The spec is your source code. The generated code is like a compiled binary. You don’t hand-edit binaries; you fix the source and recompile. Similarly, when requirements change or bugs emerge, your first instinct should be “improve the spec and regenerate” not “patch the generated code.”

“We’re moving from ‘code is the source of truth’ to ‘intent is the source of truth.'”
— GitHub Engineering Blog10

4. Redesign code review

Traditional code review assumes a human wrote the code and made human-shaped mistakes. AI-generated code has different failure modes: subtle incorrectness, context blindness, pattern mismatches. Review should focus on: Does this match the spec? Do the tests pass? Does the diff make sense? Not: Is this elegantly written?

5. Accept regeneration as normal

When a module needs significant changes, delete and regenerate from an updated spec. This feels wasteful to developers trained on code-as-precious-artifact. But regeneration is cheap; surgery is expensive. Model upgrades are free improvements if you regenerate; they’re migration costs if you’re nursing patched outputs.

The Transition Is Happening Now

In 2001, seventeen software leaders met in Snowbird, Utah and published the Agile Manifesto. The transition from Waterfall to Agile took over a decade. Agile usage increased 88% between 2002 and 2020.11

The current transition will move faster. AI capability doubles roughly every 18 months. The gap between “AI-augmented teams using old workflows” and “AI-native teams using restructured workflows” is widening rapidly.

“Like in 2025, the skepticism from 1-2 years ago is gone and almost everyone (93%) expects big gains, yet almost no one (3%) is seeing the impact yet.”
— David Heiny, on engineering leader surveys in 202612

93% expect gains. 3% are seeing them. The difference is workflow restructuring.

The engineering leaders who restructure now will build compounding advantages: specification libraries that improve with each project, evaluation harnesses that catch more issues over time, developers who become specification experts rather than code archaeologists.

The leaders who keep buying better AI tools without restructuring will keep wondering why their investment doesn’t pay off.


The Bottom Line

Your SDLC was designed for an era when code was expensive to produce. That era ended.

The new constraint is specification clarity and evaluation quality. Optimize for that, and AI becomes the 95% performer that research promised. Ignore that, and you’re in the 66% who spend more time fixing than saving.

Agile doesn’t die. It mutates. Each increment becomes a mini-waterfall: detailed spec, ruthless evaluation, regenerate rather than patch. Between increments, you iterate on requirements just like before.

This is waterfall per increment. This is how agentic coding changes everything.

The question isn’t whether to adopt AI coding tools. You already have. The question is whether you’ll restructure your workflow to match what the tools actually do well — or keep optimizing a constraint that no longer binds.

References

  1. [1]Business Insider. “The guy who coined ‘vibe coding’ now says he’s never felt more behind as a programmer.” December 2025. businessinsider.com/openai-founding-member-never-felt-so-behind-programmer-2025-12 — “Clearly some powerful alien tool was handed around except it comes with no manual…”
  2. [2]Browne, Theo. “I’m Addicted to Claude Code.” YouTube, 2025. youtube.com/watch?v=-5LfRL82Jck — Documented 11,900 lines of production code generated without opening an IDE.
  3. [3]AskFlux. “AI Generated Code: Revisiting the Iron Triangle in 2025.” askflux.ai/blog/ai-generated-code-revisiting-the-iron-triangle-in-2025 — “66% report spending more time fixing ‘almost-right’ AI generated code than they save in the initial writing phase.”
  4. [4]Ng, Andrew. “How Agents Can Improve LLM Performance.” DeepLearning.AI The Batch. deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/ — “GPT-3.5 (zero shot) was 48.1% correct… wrapped in an agent loop, GPT-3.5 achieves up to 95.1%.”
  5. [5]METR Study via AskFlux. “AI Generated Code Iron Triangle 2025.” — “The METR study tracked experienced open-source developers and found they took 19% MORE time to complete tasks when using AI tools.”
  6. [6]Times of India. “Tesla’s former AI Director Andrej Karpathy sends ‘Open Letter’.” timesofindia.indiatimes.com/technology/tech-news/teslas-former-ai-director-andrej-karpathy-sends-open-letter-to-software-engineers-i-never-felt-this-much-behind-as-a-programmer-profession-is/articleshow/126202051.cms — “It’s basically entirely hand-written (with tab autocomplete). I tried to use claude/codex agents a few times but they just didn’t work well enough.”
  7. [7]Qodo. “State of AI Code Quality 2025 Survey.” prnewswire.com/il/news-releases/despite-78-claiming-productivity-gains-two-in-three-developers-say-ai-misses-critical-context-according-to-qodo-survey-302480084.html — “65% of developers say AI misses relevant context during critical tasks like refactoring, writing tests, or reviewing code.”
  8. [8]GitClear. “AI Copilot Code Quality 2025 Research.” gitclear.com/ai_assistant_code_quality_2025_research — Code churn doubled from 2021-2024; 4x increase in code duplication.
  9. [9]Stack Overflow. “2025 Developer Survey – AI Section.” survey.stackoverflow.co/2025/ai — “Only 29% of developers trust AI tool outputs now, down from 40% just a year ago.”
  10. [10]GitHub. “Spec-driven development with AI.” github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/ — “We’re moving from ‘code is the source of truth’ to ‘intent is the source of truth.'”
  11. [11]Playsdev. “Evolution of Development Methodologies.” playsdev.com/blog/evolution-of-development-methodologies/ — “Since 2002, the use of Agile has increased by 88%.”
  12. [12]Heiny, David. “AI Conversations 2026 with Engineering Leaders.” LinkedIn, January 2026. linkedin.com/posts/david-heiny_our-first-ai-conversations-of-2026-with-engineering-activity-7413876726167977984-OBkh — “93% expects big gains, yet almost no one (3%) is seeing the impact yet.”

Discover more from Leverage AI for your business

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *