Enterprise AI Strategy

Maximising AI Cognition and AI Value Creation

Why Most Projects Fail and How to Find the Frontier

A framework for enterprise AI deployment that plays to the technology's strengths

By Scott Farrell

What You'll Learn

✓ Why 70-85% of AI projects fail—and the pattern behind the 15% that succeed
✓ The 2×2 framework that predicts which deployments will work
✓ Three versions of AI value—and why Version 3 is the untapped frontier
✓ Hyper sprints, marketplace of one, and thinking that was never feasible before

The Uncomfortable Truth About AI Failure

In the boardroom of a mid-sized insurance company, the mood was triumphant. After nine months of development and a $2.3 million investment, their AI-powered customer service chatbot was ready to launch. The CEO had promised the board something impressive. Competitors were making AI announcements. The pressure was on.

Six weeks later, the same boardroom witnessed a very different conversation. Customer complaints had spiked. Net Promoter Score had dropped. Support staff were fielding escalations from customers who couldn't get simple questions answered. The chatbot was quietly pulled offline. The project became a cautionary tale whispered in other boardrooms: "Remember when they tried AI?"

This scenario isn't fictional. It's the pattern playing out across enterprises worldwide. And the scale of the problem should alarm every executive allocating capital to AI.

The Statistics That Should Alarm You

Let's start with the uncomfortable numbers:

The AI Failure Landscape

70-85% of AI projects fail to meet expected outcomes

95% of corporate AI pilots show zero return on investment

42% of companies abandoned most AI initiatives in 2025 (up from just 17% in 2024)

46% average percentage of AI proof-of-concepts scrapped before production

2x AI projects fail at twice the rate of traditional technology projects

— RAND Corporation analysis; MIT Media Lab 2025 study; S&P Global Market Intelligence survey

Read those numbers again. This isn't a rounding error. This isn't a temporary growing pain. AI failure is officially the norm.

"Despite $30–40 billion in enterprise investment in generative artificial intelligence, AI pilot failure is officially the norm — 95% of corporate AI initiatives show zero return."

— MIT Media Lab, "The State of AI in Business 2025"

The POC-to-Production Chasm

The gap between proof-of-concept and production deployment has become a graveyard for AI ambitions. The data reveals a troubling pattern:

• Only 26% of organisations have the capabilities to move beyond POC to production
• Only 6% qualify as "AI high performers" (achieving 5%+ EBIT impact)
• 74% of companies have yet to show tangible value despite widespread investment

— BCG analysis (late 2024); Agility at Scale research

Think about what this means: three-quarters of enterprises investing in AI are seeing no meaningful return. They're running pilots, attending conferences, hiring consultants, buying tools—and getting nothing.

The POC Trap

Proof-of-concept environments are designed to succeed. They use clean data, controlled conditions, and forgiving success criteria. But they systematically mask the challenges that surface in production:

→ Data variability: Real-world data is messy, incomplete, and constantly changing
→ System integration: POCs rarely touch legacy systems, permissions, or enterprise governance
→ Edge cases: The 5% of scenarios that break your AI only appear at scale
→ Organizational friction: Real humans, real workflows, real resistance to change

Testing AI solutions in real-world scenarios isn't optional—it's the only way to understand their practical viability.

— PMI (Project Management Institute) research on AI deployment

The Root Cause: Forcing AI Into Unchanged Processes

When MIT's Media Lab systematically reviewed over 300 publicly disclosed AI initiatives, a pattern emerged. The research, led by Aditya Challapally, identified the core failure mechanism:

"Most AI efforts falter due to a lack of alignment between technology and business workflows. Companies have attempted to force generative AI into existing processes with minimal adaptation."

— MIT Media Lab, 2025

RAND Corporation's analysis reinforces this finding: misunderstandings and miscommunications about the intent and purpose of AI projects are the most common reasons for failure. Organizations launch AI initiatives without genuine clarity about what problem they're solving or why AI is the right tool.

— RAND Corporation, "Root Causes of AI Project Failure"

The List of Failures

Beyond the alignment problem, enterprise AI projects stumble over a predictable set of obstacles:

Common AI Project Failure Points

Data & Governance

• Poor data hygiene and quality
• Lack of governance frameworks
• Data privacy and security risks

Infrastructure & Operations

• Inappropriate internal infrastructure
• Lack of proper AI operations capability
• Cost overruns that impact profitability

Strategy & Execution

• Failure to choose the right proof of concept
• Treating AI deployment as SaaS procurement
• Misalignment between tech and workflows

Organizational Readiness

• Off-the-shelf tools have lower adoption than custom
• Lack of skilled internal teams
• Resistance to workflow redesign

— NTT Data; RAND Corporation; S&P Global analysis

Notice what's missing from this list: the AI itself. The technology isn't failing. The deployments are.

The Puzzling Contrast

Here's where the narrative gets interesting. While the majority fail spectacularly, a minority succeed spectacularly. The contrast is stark:

The AI Success Stories

$3.70 return per dollar invested for early adopters

Companies that moved into generative AI adoption early report consistent positive ROI

$10.30 return per dollar invested for top performers

High performers achieve more than 10x return on AI investments

74% achieved ROI within the first year

Google Cloud study (September 2025) shows rapid value realization is possible

$40M annual benefit from single AI assistant

Klarna's AI assistant replaced work of 700 employees, contributing estimated $40M profit improvement in 2024

— Fullview AI Statistics 2025; Articsledge AI ROI Case Study; Google Cloud ROI study; Punku AI Enterprise Adoption research

Let's pause on that Klarna example. A single AI system delivering a net benefit of $35-38 million annually (after technology costs of $2-5 million). That's not incremental improvement. That's transformation.

So we face a paradox:

→ 85% of organizations see AI projects fail
→ 15% achieve returns of 3x to 10x on their investment

The difference isn't the technology. Everyone has access to the same models, the same tools, the same cloud platforms. The difference is where and how they deploy.

The Question This Book Answers

What separates the 15% that succeed from the 85% that fail?

The answer isn't:

✗ Better AI models (everyone has access to GPT-5, Claude, Gemini)
✗ Bigger budgets (failures waste millions; successes often start small)
✗ Better vendors (same platforms are used by winners and losers)
✗ More AI expertise (PhD-heavy teams fail as often as pragmatic ones)

The answer is:

Successful organizations deploy AI where it has asymmetric advantage.

They don't try to make AI work everywhere. They identify the specific contexts where AI's strengths dramatically outweigh its weaknesses—and they avoid the contexts where the opposite is true.

This book gives you the framework to make that distinction. You'll learn:

What You'll Discover

Chapter 2: The real economic argument for AI (stripped of hype)—why it's about the cost of cognition, not robots or automation

Chapter 3: The latency vs. accuracy asymmetry that predicts chatbot failures and explains why humans still win live interactions

Chapter 4: A 2×2 deployment matrix that shows you exactly where AI creates value and where it destroys it

Chapter 5: The three versions of AI value—and why Version 3 (previously impossible work) is where transformation happens

Chapters 6-7: Concrete examples of Version 3 AI: hyper sprints that replace committee-think, and marketplace-of-one personalization

Chapter 8: The cognitive exoskeleton model—how AI augments humans instead of replacing them (with 90%+ performance gains)

Chapter 9: The right questions to ask before any AI project, the on-costs everyone underestimates, and success patterns from organizations getting it right

By the end of this book, you won't just understand why 85% of AI projects fail. You'll have a systematic way to ensure yours doesn't join them.

Key Takeaways

• AI failure isn't the exception—it's the norm. 70-85% of projects fail, with 95% showing zero ROI. This isn't a temporary problem; it's a systematic deployment error.
• The pattern: forcing AI into unchanged processes. Organizations try to bolt AI onto existing workflows with minimal adaptation, creating misalignment between technology capabilities and business reality.
• Some companies achieve 10x returns. The contrast between 85% failure and 15% spectacular success reveals that deployment decisions, not technology choices, determine outcomes.
• The answer isn't better AI; it's better deployment decisions. Success comes from identifying contexts where AI has asymmetric advantage and avoiding contexts where it doesn't.

The uncomfortable truth is that most organizations are approaching AI backwards. They're asking "Where can we use AI?" when they should be asking "Where does AI create asymmetric value?"

The next chapter shows you the economic logic behind that reframe.

The Real Carrot: Cost of Cognition

Strip away the hype. What do companies actually want from AI? "Getting on the bandwagon" isn't a business case you can take to a CFO. But underneath the noise, there's a genuine economic story—one that changes how we think about strategic capability.

If you ask a CEO why they're investing in AI, they'll rarely say the quiet part out loud. But when pressed—when you get past the "innovation agenda" and the "staying competitive" rhetoric—the answer almost always comes down to economics.

And the economics aren't about robots. They're about thinking.

From "AI" to "Cheap Cognition"

Here's the reframe that cuts through: firms buy AI because they believe thinking per hour is going to be way cheaper than human thinking per hour.

Let's give it a name: cost per unit of useful cognition.

For humans, that's not just salary. It's salary plus all the on-costs: management overhead, desk space, tooling, training, coordination overhead, and—let's be honest—meetings to decide what the meetings meant.

For AI, you've got a different stack of costs:

•Model costs: per token, per call, licensing fees
•Platform and orchestration: the infrastructure to make AI callable
•Integration: hooking AI into existing systems and data
•Governance and compliance: policy frameworks, audit trails, risk controls
•Monitoring and observability: dashboards, logs, anomaly detection
•Incident response: handling errors, rollbacks, escalation paths

So no, AI isn't "$1 per hour" when you account for everything. But the proposition still holds: once the plumbing is in place, the marginal cost of an additional thinking task trends toward cents, not dollars.

"The carrot isn't AI. The carrot is: We can throw 100x more thinking at our problems than we used to, for roughly the same spend."

That's a very different pitch than "let's automate some jobs."

The Investment Reality: What Companies Are Actually Spending

If this sounds theoretical, the numbers make it concrete. Global AI spending hit $154 billion in 2024, and it's forecast to reach $300 billion by 2027.

For the average mid-to-large enterprise, annual AI spend now sits around $6.4 million, broken down like this:

Average Enterprise AI Investment Breakdown (Annual)

AI Software & Platforms

$2.4M

+47% year-over-year growth

AI Talent & Consulting

$1.8M

+52% year-over-year growth

Infrastructure & Compute

$1.2M

+34% year-over-year growth

Training & Development

$650K

+61% year-over-year growth

— Second Talent, "AI Adoption in Enterprise Statistics 2025"

Notice what's happening: the fastest-growing line item is training and development (up 61% year-over-year), followed by talent and consulting (up 52%). Companies are learning that the technology is the easy part. Building the capability—the people, processes, and judgment to deploy AI well—is where the real investment goes.

Per-Inference Economics: The Hidden Burn Rate

Let's zoom into the unit economics. The price collapse has been staggering: GPT-4 launched at $60 per million output tokens in 2023. By late 2025, GPT-4.1 nano costs just $0.40 per million—a 99% drop in under three years. Even GPT-5, with vastly superior capabilities, costs less than GPT-4 did at launch. Sounds like the problem is solved, right?

But scale changes everything. One fintech reported their enterprise chatbot was burning $400 per day per client. For AI companies like OpenAI, infrastructure costs represent roughly 75% of revenue. That's not a sustainable margin—it's a subsidy hoping for volume.

The Marginal Cost Promise

Here's where the economics get interesting. Once you've built the plumbing—the integration, governance, monitoring, and escalation paths—the cost of running one more query or handling one more task drops toward near-zero.

Compare that to humans:

Marginal Cost Comparison

Human Worker

To handle 10% more tasks, you typically need to:

• Hire another person (months of lead time)
• Pay full salary + on-costs
• Onboard, train, and ramp up (3–6 months)
• Accept coordination overhead increases non-linearly

Marginal cost: Nearly the same as average cost. Scaling is expensive and slow.

AI System

To handle 10% more tasks, you typically:

• Run the same infrastructure (already built)
• Pay a few extra cents in token costs
• Scale instantly (no hiring, no onboarding)
• Coordination overhead stays flat or decreases

Marginal cost: Pennies. Scaling is near-instant and near-free once the platform exists.

This is the promise executives are buying: elastic cognitive capacity. Apply 10x, 100x, even 1,000x more thinking to a problem without hiring a village.

But—and this is critical—that only holds if you've built the right plumbing. If every AI call requires manual review, custom integration, or one-off fixes, you've just recreated the human scaling problem with different tooling.

How to Talk to Executives About This

Language matters. A lot.

If you talk about "robots," executives hear science fiction. If you talk about "automation," they hear job cuts—and then HR and middle management start resisting.

But if you talk about cheap cognition, you're talking about capability expansion. You're shifting the conversation from cost centre to force multiplier.

The Pitch That Lands

"We can apply 100x more thinking to our strategic problems than we could before. The cost structure for analysis, decision support, and customer insight has fundamentally changed."

"Cognition used to be our constraint. Now it's abundant. The question is: where do we deploy it for asymmetric advantage?"

Notice what that does: it reframes AI from a replacement story to an amplification story. You're not cutting heads. You're expanding what's possible.

What Executives Actually Care About

Research shows that when enterprises evaluate AI investments, they prioritise:

1.Measurable value delivery (30%): Can you show ROI in business terms?
2.Industry-specific customisation (26%): Does it fit our context?
3.Price considerations (1%): Cost matters, but it's tertiary

— Punku AI, "State of AI 2024 Enterprise Adoption"

Translation: executives don't care about tokens, parameters, or whether you're using GPT-5 versus Claude. They care about outcomes they can defend in a board meeting.

So frame your AI proposals in their language:

See the pattern? You're translating from technology capability to business impact. That's the conversation that unlocks budget.

The Reality Check: It's Not Actually "$1 Per Hour"

Let's be honest: the "$1 per hour AI" line is marketing. When you add up the on-costs—governance, monitoring, integration, incident response, model maintenance—AI isn't free.

But here's the crucial comparison: you need to benchmark AI's all-in cost against the all-in cost of human cognition. Not salary versus tokens. Total cost of ownership versus total cost of ownership.

When you do that math properly—and we'll detail the AI on-costs in Chapter 9—the economic case still holds for the right deployments. The key phrase: for the right deployments.

When the Economics Work

The cognition cost advantage isn't universal. It kicks in when you have:

High Volume

Enough cognitive tasks that the upfront investment in plumbing pays off quickly. If you're only running 100 queries a month, stick with humans.

Parallelisable Work

Tasks that can run simultaneously without coordination overhead. One human handles one thing at a time; one AI platform can handle thousands in parallel.

Time Flexibility

Work that doesn't require instant responses. Batch processing, overnight analysis, ticket queues—anywhere you can trade latency for accuracy and thoroughness.

If you're missing any of those three, the economics get murky fast. And if you need real-time, high-stakes, one-shot-correct answers—the kind where a customer is waiting on the other end—AI might actually be more expensive than a human when you account for all the guardrails and verification layers you need.

That's not a technology limitation. It's an economic reality. And understanding where AI's cost advantage kicks in versus where it evaporates is the difference between a project that compounds value and one that burns budget.

Which brings us to the next chapter: the asymmetry nobody talks about.

Chapter Takeaways

• The AI business case is about cost per unit of cognition, not "AI" as a buzzword
• Human costs include massive on-costs: management, facilities, coordination, training—not just salary
• AI costs include infrastructure, governance, monitoring, and ops—not just tokens
• The promise: marginal cost of additional thinking trends toward near-zero once plumbing is built
• Talk to executives about capability amplification, not job replacement—it unlocks budget and reduces political resistance
• The economics work best for: high volume + parallelisable tasks + time flexibility

The Asymmetry Nobody Talks About

Two support teams at different companies receive the same customer question at 9am on a Monday.

Tale of Two Deployments

⚡ Scenario A: Live Chat

• Customer frustrated, expects response in 30 seconds
• AI chatbot has one shot to be right
• Wrong answer → escalation → bad NPS
• Screenshot-ready failure for social media

Success criteria: Fast + accurate + one chance

🎯 Scenario B: Ticket Queue

• Response expected within 4 hours
• AI agent can check history, cross-reference systems
• Can escalate if uncertain
• Customer already primed for measured response

Success criteria: Thorough + verifiable + time flexibility

The paradox: Same AI technology. Same customer question. Completely different success rates.

This asymmetry is why 72% of customers say chatbots are a complete waste of time—yet the same AI technology deployed in ticket systems delivers 40-60% cost savings with measurably better accuracy.

The Latency-Accuracy Trade-Off

In a live chat session, your AI effectively has one shot to be right. There's a real customer on the other end. They're frustrated. They've already waited through the automated phone tree. They're one bad experience away from tweeting about your brand.

No opportunity to be correct on the third attempt. No room for iterative refinement. No forgiveness for "getting there eventually."

What Real-Time Reliability Actually Costs

To get that one-shot accuracy, you need to throw everything at the problem:

• A stronger model (more expensive per call, and increased latency)
• Thinking models for complex reasoning (adds significant latency—seconds to minutes—trading speed for accuracy)
• Rich context from multiple systems—retrieval, CRM, policies, knowledge base
• Guardrails—security filters, PII checks, brand tone alignment
• Maybe a second-pass checker to catch hallucinations and validate responses

All of that adds latency and cost per interaction. Every safeguard you add makes the response slower. Every verification step increases the risk that your customer is staring at a "typing..." indicator for 8, 10, 15 seconds.

"Giving a quick and accurate answer is not something AI is equipped to deal with at the moment. You can't get it right on the third go around, because there's a real customer on the other end."

The Research Confirms It

Real-time AI monitoring consistently shows higher false positive rates compared to batch processing. Why? Limited context and the pressure to make quick decisions force the AI to operate with incomplete information.

— Galileo AI, "Real-Time vs Batch Monitoring for LLMs"

The same pattern holds across domains: when AI systems need to respond instantly, they trade accuracy for speed. More complex models provide higher accuracy but require more processing time. This fundamental trade-off shapes success or failure in deployment.

— BugFree AI, "Latency vs Accuracy Tradeoffs in Real-Time Systems"

Why Humans Still Win (For Now)

Here's the uncomfortable truth: in low-latency, high-stakes, ambiguous situations, a trained human support agent often still outperforms AI.

Even if that human takes 30 seconds or a minute to respond, they're bringing advantages that current AI can't replicate under time pressure:

Tacit Knowledge

Years of experience with internal tools. Pattern-matching that happens intuitively. "I've seen this before" recognition.

Real-Time Adjustment

Can read customer tone and pivot mid-conversation. "No, that's not what I meant" gets handled gracefully.

Judgment Calls

Knows when to break policy to keep a customer. Can spot when "standard answer" will make things worse.

Recovery Mode

"Let me try that again" works. Customers forgive humans being slightly wrong if correction is quick and genuine.

The human doesn't need perfect accuracy on the first try. They can course-correct. They can say "I'm checking with our billing team" and buy time without losing trust. They bring social intelligence to delicate situations.

AI in live chat doesn't get these affordances. It needs to be right, safe, on-brand, and fast—all at once.

The Chatbot Disaster Data

If this feels like we're being too harsh on chatbots, the customer satisfaction data is even harsher.

The Numbers Don't Lie

72%

Say chatbot interaction is "complete waste of time"

— UJET survey, Forbes 2022

78%

Forced to connect with human after chatbot failure

— UJET survey, Forbes 2022

63%

Chatbot interaction did not result in resolution

— UJET survey, Forbes 2022

80%

Chatbots increased their frustration level

— UJET survey, Forbes 2022

The kicker: 64% would prefer companies not use AI for service at all. 53% would switch to a competitor offering human support.

— LinkedIn analysis of UK customer satisfaction surveys, 2024

Why Chatbots Fail: The Technical Breakdown

1. Weak Escalation Protocols
Chatbots loop canned responses instead of recognizing complexity and escalating to humans. Customers get stuck in conversation dead-ends with no clear exit.

2. Incomplete Knowledge Bases
Many chatbots are trained on outdated or poorly structured data. If the bot can't access accurate information in real-time, it delivers wrong answers that erode trust.

3. "Gatekeeper Aversion"
Users perceive chatbot failure risk as high from past experiences. They actively avoid engaging with bots because they expect to waste time and eventually need a human anyway.

4. Anthropomorphic Expectations
Human-like chatbot features (friendly language, avatars, personalities) raise customer expectations. When the bot fails to deliver human-level understanding, the disappointment is amplified.

— WorkHub, "Top 7 Reasons Chatbots Fail in Customer Service"; Johns Hopkins Carey, "Hurdles to AI Chatbots"; Nature, "Consumer Trust in AI Chatbots"

The Attribution Problem

When customers experience chatbot failures, they don't blame "this specific instance"—they blame AI capabilities as a category. Because AI capabilities are seen as relatively constant and not easily changed, customers assume similar problems will keep recurring.

This creates a trust death spiral: one bad experience poisons the well for future interactions. Unlike human service failures (which customers attribute to "that specific agent having a bad day"), AI failures feel systemic and unfixable.

— Nature, "Consumer Trust in AI Chatbots: Service Failure Attribution"

The Flip: When AI Has Time

Now reverse the scenario. What happens when you remove the time pressure?

When AI has latency tolerance—minutes, hours, or overnight batch processing—the entire game changes.

What Changes With Time Flexibility

Read Full History

AI can parse every past interaction, invoice, support ticket, and account note—no skimming required.

Cross-Check Systems

Check CRM, billing, product database, knowledge base, policy docs—sequentially or in parallel, no rush.

Multi-Step Reasoning

Chain-of-thought reasoning, verify assumptions, explore edge cases. Try multiple approaches and select the best response.

Escalate When Uncertain

Instead of guessing, flag complex cases for human review. No customer is waiting, so escalation doesn't feel like failure.

The customer already expects a measured response—minutes or hours, not seconds. They submitted a ticket, not initiated a chat. Their mental model already accommodates asynchronous communication.

In this context, AI can really put thought into each response. It's not fighting the clock. It's playing to its strengths.

The Batch Processing Economics

The cost structure completely inverts when you move from real-time to batch processing.

Infrastructure Cost Comparison

Batch processing typically delivers 40-60% infrastructure cost savings compared to real-time systems, with savings increasing at higher volumes.

Example: 1 million requests daily

Real-time system: 100 GPU instances running 24/7

Batch system: 20 GPUs running during off-peak hours

Same volume. Same work. 80% fewer resources.

— Zen van Riel, "Should I Use Real-Time or Batch Processing for AI: Complete Guide"

Beyond cost, batch monitoring provides more accurate analysis than real-time. With a broader view across datasets and time to verify patterns, false positive rates drop significantly.

— Galileo AI, "Real-Time vs Batch Monitoring for LLMs"

Myth vs Reality: Hallucinations

❌ The Myth

Hallucinations are AI's biggest risk. If we can solve hallucinations, we can deploy AI everywhere.

✓ The Reality

Hallucination is just one model making something up once. The real risks that keep executives awake:

Systemic wrongness: AI faithfully executing a bad specification, 24/7, at scale
Unobservable decisions: No logs, no traces, no reasoning trails—you don't know why it answered, only what it answered
Accountability blur: Is the bug in the model, the prompt, the retrieval system, or the business logic? Risk officers start sweating.
Silent drift: Models, data, and prompts change over time; behavior slowly shifts, and no one notices until something embarrassing hits regulators or social media

"Hallucination is just a model making something up once. The real risk is an AI system being wrong consistently and invisibly for months."

This is why observability, governance, and continuous monitoring matter far more than chasing perfect accuracy in isolation.

The Fundamental Asymmetry

We're describing two fundamentally different games with the same technology.

Game 1: Live Interaction

Requirement: One-shot accuracy under time pressure
Constraint: Must respond in seconds
Cost: Expensive safeguards, higher model, still often fails
Customer expectation: Immediate, perfect
Error tolerance: Near zero

⚠️ AI's weak zone

Game 2: Batch/Queue

Requirement: Iterative accuracy with verification
Constraint: Can respond in minutes/hours
Cost: 40-60% cheaper, higher accuracy
Customer expectation: Thoughtful, thorough
Error tolerance: Can escalate if uncertain

✓ AI's strong zone

Same AI capabilities. Completely different outcomes.

The asymmetry is this: AI is terrible at fast + accurate + one-shot. But it's brilliant at batch processing with time flexibility.

Most companies deploy AI exactly where it's weakest—live customer interactions with no room for error—and then wonder why it fails.

"If you're trying to replace a human in a live, high-pressure interaction, you're fighting the technology. If you're turning big slow queues into fast parallel flows, you're playing to its strengths."

Chapter Takeaways

• AI faces a fundamental latency-accuracy trade-off that can't be engineered away
• Live interactions require expensive safeguards that still often fail—72% of customers say chatbots waste their time
• Humans still win in low-latency, high-ambiguity, high-stakes contexts through tacit knowledge and recovery
• Batch processing delivers 40-60% cost savings and higher accuracy vs real-time systems
• Hallucinations aren't the real risk—systemic wrongness at scale and silent drift are
• Deploy in batch/queue contexts where time flexibility exists, not in live-chat contexts where humans excel

The 2x2 That Predicts Success

You've heard the pitch: AI will transform customer service, accelerate decision-making, automate operations. Then you deploy a chatbot that frustrates customers. Launch a real-time analytics system that costs three times more than projected. Build an autonomous approval engine that your compliance team refuses to certify.

The problem isn't the technology. It's that we lack a simple framework for predicting which AI projects will succeed versus which will fail. Let me give you one you can hold in your head.

TL;DR

• Map AI projects on two axes: latency tolerance (can we wait?) × error cost (what happens if wrong?)
• Bottom-right quadrant = prime AI territory: high latency tolerance, low error cost, 40-60% cost savings
• Left side = human-led territory: low latency tolerance requires instant responses AI can't reliably deliver
• Most chatbot failures are left-side problems deployed as right-side solutions
• Move the conversation from "we need a chatbot" to "we need an AI queue brain"

The Two Dimensions That Matter

Every AI deployment decision can be mapped on two critical dimensions. Understanding where your use case sits on this framework will predict success or failure with remarkable accuracy.

X-Axis: Latency Tolerance

Can this work wait? That's the first question. Not "should it wait" but "can it wait without breaking the business process or disappointing the customer?"

Left Side: Must Respond in Seconds

Examples: Live chat support, synchronous API calls, real-time fraud detection, on-the-fly pricing, delicate negotiations, high-stakes emergency decisions

Constraint: Human is waiting. Every second of latency is visible and costly.

AI Challenge: Accuracy requires context, verification, and reasoning time. Speed demands shortcuts.

Right Side: Can Respond in Minutes / Hours / Overnight

Examples: Ticket queues, batch analytics, overnight reports, document processing, CRM scoring, research tasks, data reconciliation

Advantage: No one is waiting. AI can take time to verify, iterate, and improve quality.

AI Sweet Spot: Time flexibility allows extended thinking, multi-pass verification, and higher accuracy.

Y-Axis: Consequence of Error

What happens if the AI gets it wrong? This determines how much verification, oversight, and human review you need in the loop.

Bottom: Cheap to Fix

Examples: Internal notes wrong, trivial customer inconvenience, easy re-run, draft documents, research summaries

Recovery: Human can spot-check, customer can ask for clarification, system can re-process easily

Design Implication: High autonomy acceptable. Focus on throughput over perfection.

Top: Expensive / Regulated / Safety-Critical

Examples: Compliance violations, customer churn risk, legal exposure, financial loss, safety incidents, regulatory breaches

Risk: Error costs exceed benefit. Single failure damages trust or triggers audit.

Design Implication: Requires human-in-the-loop, approval gates, audit trails, and graceful escalation.

The AI Deployment Framework

Consequence of Error →

Latency Tolerance →

🚨 Danger Zone

High-stakes + Real-time

Human-led ONLY

• Live emergency response

• Real-time compliance

• Critical negotiations

AI: Context surfacing only

⚖️ Copilot + Gate

High-stakes + Time flexibility

AI drafts, human approves

• Credit policy changes

• Complex pricing

• Regulatory reports

AI: Deep analysis + recommendation

⚡ Speed Over Perfection

Low-stakes + Real-time

AI-assisted, human-led

• Live chat (low-value)

• Simple FAQs

• Instant categorization

AI: Surface context, human responds

🎯 Prime AI Territory

Low-stakes + Time flexibility

Full AI autonomy

• Ticket triage & resolution

• Overnight batch analytics

• Report generation

• Document processing

40-60% cost savings here

The Four Quadrants Explained

Bottom-Right: Prime AI Territory

This is where AI dominates. High latency tolerance gives AI time to think, verify, and iterate. Low error cost means mistakes are cheap to catch and fix. The economics are compelling.

Sweet spots in this quadrant:

Ticket queues: Triage incoming requests, resolve simple cases autonomously, draft detailed responses for complex cases, surface relevant context for human agents
Overnight batch jobs: Transaction analysis, CRM lead scoring, anomaly detection, data quality checks, reconciliation tasks
Report generation: Research synthesis, competitive intelligence, quality assurance at scale, documentation updates
Internal operations: Document classification, data extraction, workflow orchestration, process monitoring

"Batch processing typically reduces infrastructure costs by 40-60% compared to real-time systems, with savings increasing at higher volumes. A real-time system processing 1 million requests daily might require 100 GPU instances running 24/7, while a batch system could process the same volume with 20 GPUs running during off-peak hours."

— Zen van Riel, "Should I Use Real-Time or Batch Processing for AI?"

Top-Right: AI + Human Sign-off

When error cost is high but you have time flexibility, deploy the "copilot + gate" pattern. AI does the heavy analytical work—research, modeling, draft recommendations—and a human reviews and approves before execution.

Copilot + Gate: Use Cases

AI's Role

• Analyze historical patterns and scenarios
• Generate policy recommendations with supporting evidence
• Model potential outcomes and risks
• Draft comprehensive reports with citations
• Surface edge cases and compliance considerations

Human's Role

• Review AI's reasoning and verify accuracy
• Apply judgment to political/cultural factors
• Make final go/no-go decision
• Take accountability for the outcome
• Override when context demands different approach

Example scenarios: Credit policy changes, complex B2B pricing decisions, regulatory report preparation, strategic vendor recommendations, M&A due diligence analysis

The key: AI multiplies the human's analytical capacity by 10-100x, but the human retains decision authority and accountability. You get better decisions faster, without sacrificing oversight.

Left Side: Human-Led Territory

When latency tolerance is low—when someone is waiting for an immediate response—human expertise still wins. This is true regardless of whether error cost is high or low.

Why humans still lead in real-time contexts:

Accuracy requires time. AI needs context, verification, and reasoning cycles to deliver high-quality responses. Real-time pressure forces shortcuts that degrade accuracy.
Humans read social cues. In live interactions—especially delicate or emotional ones—humans pick up tone, frustration, urgency signals that text-based AI misses.
Escalation is inevitable. When AI hits its limits in a live context, the handoff to a human feels like failure. Better to start with human-led and use AI as augmentation.

"Over 72% of respondents in a recent UJET survey reported that interaction with a chatbot is a 'complete waste of time.' 78% of consumers have interacted with a chatbot in the past 12 months—but 80% said using chatbots increased their frustration level."

— Forbes, "Chatbots and Automations Increase Customer Service Frustrations"

The Framework in Practice

Mapping Your Use Cases

Here's how to use this framework immediately:

List your current or planned AI projects. Be specific: "AI chatbot for customer support" or "Automated credit approval" or "Overnight analytics reports."
Ask the two questions for each project:
- Can this work wait minutes/hours, or must it respond in seconds?
- If AI gets this wrong, is it cheap to fix or expensive/regulated/safety-critical?
Place each project on the 2x2 grid. Be honest about where it actually sits, not where you wish it would sit.
Projects in bottom-right quadrant: Green light. These are likely to succeed and deliver strong ROI. Focus your investment here.
Projects on the left side: Redesign as human-led with AI augmentation. Don't deploy as autonomous systems.
Projects in top-right: Good candidates, but require approval gates and audit trails. Budget for human review overhead.

Common Misallocations

The most common AI project failures happen when organizations misread where their use case sits on the grid:

The Chatbot Mistake

❌ What Organizations Think

• "Live chat is just text-based, so AI can handle it"
• "Customers will tolerate some errors if responses are instant"
• "We'll save money by automating our support team"

Result: 78% of customers escalate to humans anyway. Frustration increases. AI gets blamed.

✓ What Actually Works

• "Live chat is left-side: low latency tolerance"
• "AI surfaces context; human owns the conversation"
• "Ticket queues are right-side: AI can resolve 40-60% autonomously"

Result: Ticket resolution faster and cheaper. Live chat becomes more effective. Customers happier.

Real-Time Fraud Detection Without Escalation

Misallocation: Deployed as autonomous system in top-left quadrant (high error cost + low latency tolerance).

Problem: False positives block legitimate transactions. False negatives allow fraud through. No time for verification.

Fix: AI flags suspicious activity with confidence scores. High-confidence blocks proceed autonomously. Medium-confidence triggers human review. Low-confidence transactions pass with monitoring.

Pro Tip: Run a second batch job when latency isn't critical—with more detailed analysis, a smarter model, extended thinking, and comparison against broader data patterns and other customer interactions. This second pass catches what the real-time system missed, flagging additional transactions for review. Even tickets routed to human-in-the-loop should arrive with AI-generated augmentation: context summaries, similar past cases, confidence breakdowns, and recommended actions.

Autonomous Customer Service with No Fallback

Misallocation: Chatbot designed to handle all inquiries without clear escalation path.

Problem: AI loops on canned responses when it can't understand. Customer gets frustrated. Eventually finds phone number or gives up.

Fix: Recognize you're on the left side. Use AI for triage and simple cases only. Route complex cases to ticket queue for async resolution. Provide instant "talk to human" button.

Batch Processing: The Economics

The bottom-right quadrant's economic advantage isn't obvious until you understand the infrastructure math behind batch versus real-time processing.

Why Batch Wins

Batch processing delivers 40-60% cost savings because of four structural advantages:

Dimension	Real-Time	Batch
Architecture	Complex: load balancers, autoscaling, hot standby, real-time data pipelines	Simple: scheduled jobs, sequential processing, standard ETL
Resource Provisioning	Sized for peak load 24/7, even if peak is 2 hours/day	Sized for total volume, run during off-peak pricing windows
Verification	Limited—speed is priority	Extensive—can iterate and verify before delivery
Operational Overhead	High: monitoring, alerting, incident response 24/7	Low: failures can wait until morning, easier debugging

"Batch = Simpler, slower, cheaper. Streaming = Faster, more complex, higher ops overhead. Most mature systems combine both—batch for deep, historical analysis; streaming for instant reactions. Pick the right model based on latency tolerance, data volume, and complexity."

— Nikki Siapno, "Batch Processing vs Real-time Streaming"

The Infrastructure Math

Example: Processing 1 Million Requests Daily

Real-Time System:

• Peak load: 50,000 requests/hour (10% of daily volume concentrated in 2-hour window)
• Provisioning: 100 GPU instances running 24/7 to handle peak
• Monthly compute cost: ~$150,000 (100 instances × $50/day × 30 days)
• Utilization: ~15% average (only busy during peak hours)

Batch System:

• Scheduled run: 11pm-6am daily (off-peak pricing)
• Provisioning: 20 GPU instances running 7 hours/day
• Monthly compute cost: ~$40,000 (20 instances × $66/day × 30 days)
• Utilization: ~95% during run window

Savings: $110,000/month (73% reduction)

Plus reduced operational complexity, easier debugging, and higher-quality outputs from multi-pass verification.

Selective Real-Time Pattern

You don't have to choose between batch and real-time for your entire system. The smartest architectures use both:

Apply real-time processing only to high-value or time-sensitive operations where the benefit of instant response justifies the cost
Route everything else to batch queues for overnight or scheduled processing
Use AI agents to decide routing based on request characteristics: urgency signals, customer tier, complexity assessment

This "selective real-time" pattern gets you the best of both worlds. Critical work happens instantly. Everything else benefits from batch economics and higher quality.

Moving the Conversation

The framework changes how you talk about AI projects. It shifts the conversation from technology ("we need a chatbot") to business context ("we need to process high-volume cognitive work with time flexibility").

From Chatbot to Queue Brain

Reframing the AI Discussion

❌ Stop Saying:

• "We need an AI chatbot for customer support"
• "Let's automate our live agents with AI"
• "AI will make our real-time systems faster and cheaper"

✓ Start Saying:

• "We need an AI queue brain for ticket resolution and an AI context engine for live agents"
• "Let's use AI to multiply agent effectiveness by 2-3x while keeping humans in the driver's seat"
• "AI excels in batch/queue contexts—let's find our highest-volume async workflows"

Impact: This reframing immediately filters out low-probability-of-success projects and focuses investment on the bottom-right quadrant where ROI is proven.

The Strategic Questions

When evaluating AI opportunities, ask these two questions first:

"Where do we have high-volume cognitive work with time flexibility?"
Look for: ticket backlogs, overnight batch jobs, report generation, document processing, research tasks, data reconciliation, quality checks, triage workflows.
"Where are errors cheap to fix or catch before they cause harm?"
Look for: internal-only processes, draft outputs that humans review, operations with undo capability, contexts where verification is fast/easy.

The intersection of these two questions is your AI project pipeline. Prioritize based on volume, current cost, and strategic importance.

Chapter Takeaways

Map AI projects on two axes: Latency tolerance (can this work wait?) × Error cost (what happens if AI is wrong?). This 2x2 predicts success better than any technology assessment.

Bottom-right quadrant = prime AI territory: High latency tolerance + low error cost delivers 40-60% cost savings, simpler architecture, and higher quality outputs.

Left side = human-led territory: When latency tolerance is low, humans still win. Deploy AI as augmentation (context surfacing, draft responses) not replacement.

Chatbot failures are allocation errors: 78% of customers escalate to humans because live chat sits on the left side (low latency tolerance) but is deployed as if it's on the right side.

Batch economics beat real-time: Same volume, 70% lower cost. Batch systems provision for total volume during off-peak hours. Real-time systems provision for peak load 24/7.

Selective real-time is the mature pattern: Route high-value/time-sensitive work to real-time processing. Route everything else to batch queues. Use AI to decide routing.

Reframe the conversation: Stop saying "we need a chatbot." Start saying "we need an AI queue brain for async work and AI augmentation for live agents."

Ask the two strategic questions: (1) Where do we have high-volume cognitive work with time flexibility? (2) Where are errors cheap to catch? The intersection is your AI pipeline.

What's Next

The 2x2 framework tells you where to deploy AI. But it doesn't tell you what becomes possible when cognition becomes abundant and cheap. That's what Version 3 unlocks: work that was previously too expensive to even attempt.

In the next chapter, we'll explore the three versions of AI value—and why most organizations are still stuck on Version 1 while the real transformation happens at Version 3.

The Three Versions of AI Value

Most AI conversations stop at "automation"—replace human tasks, save money, move on. But that's only the first version, and ironically, it has the highest failure rate. To understand why some companies achieve 10x returns while 85% fail, you need to see the full progression: from cost reduction to capability amplification to entirely new frontiers that were never rational to attempt before.

TL;DR: The Three Versions

• Version 1 (Automation): Same work, fewer people—highest failure rate, competing with humans at what they do well
• Version 2 (Scale): 10-100x more thinking at same problems—$3.70-$10.30 ROI per dollar, check everything instead of sampling
• Version 3 (Frontier): Previously impossible thinking—work that was never rational because coordination overhead killed it

Version 1: Same Work, Fewer People

This is the classic automation play. Replace a human task with AI. Invoice processing, email triage, basic data entry—the stuff that shows up first in consultant decks and vendor demos. It's the most common deployment pattern, and it's also where most of the 70-85% failure rate lives.

Version 1: The Automation Play

What it is: Replace a human doing task X with AI doing task X

Value promise: Cost reduction through headcount savings

Why it often fails: You're competing with humans at what humans do reasonably well, in contexts optimised for human cognition over decades

When it works: Only in the right quadrant—high latency tolerance, low error cost, volume high enough to justify infrastructure

Why does this fail so often? Because you're asking AI to beat humans in environments humans designed for themselves. The workflows were built around human strengths. The tools reflect human mental models. The edge cases got handled through years of accumulated judgment.

When you drop AI into these contexts—especially live, high-stakes interactions—you're fighting the technology. And as we saw in Chapter 3, AI loses that fight 72% of the time in customer service scenarios.

When Version 1 Actually Works: Klarna's $40M Win

But Version 1 isn't always doomed. When deployed in the right context—remember the 2x2 from Chapter 4—it can deliver extraordinary results.

Case Study: Klarna's AI Assistant

The Setup

OpenAI-powered AI assistant handling customer service interactions

The Scale

Replaced work of 700 customer service agents

The Math

• Direct salary savings: $28M (700 × $40K)
• Operational costs: $12M+ (training, benefits, facilities)
• Technology costs: -$2-5M
• Net benefit: $35-38M annually

The Key Insight

This isn't live real-time support with one-shot accuracy requirements. It's ticket handling—batch-adjacent work where AI has time to think, verify, and escalate when uncertain.

Source: Analysis of Klarna AI implementation, 2024 annual report

Klarna's success illustrates the critical distinction: they didn't deploy AI for instant-response live chat. They used it for ticket-based interactions where the system had time to:

• Read full customer history
• Cross-check policies and account details
• Run multi-step reasoning
• Escalate complex cases to humans

That's bottom-right quadrant deployment: high latency tolerance, manageable error cost, massive volume. Version 1 in the right context delivers. Version 1 in live-chat contexts? Seventy-two percent failure rate.

Version 2: 10-100x More Thinking at Same Problems

This is where the conversation gets interesting. Version 2 isn't about replacing people—it's about amplifying cognitive output beyond what was economically feasible before.

The fundamental shift: instead of sampling, you check everything.

Version 2: The Scale Play

What it is: Apply 10-100x more cognitive analysis to problems you already have

Value promise: Depth and coverage impossible with human-only teams

Examples: One analyst sampling 50 transactions → AI checks every transaction. Triaging 100 tickets/day → AI triages 10,000. Spot-checking CRM → row-level analysis overnight.

ROI reality: Companies achieving Version 2 report $3.70-$10.30 return per dollar invested (Source: Fullview AI Statistics 2025)

The Marginal Cost Revolution

Here's why Version 2 changes the game: the economics of additional thinking have inverted.

Old Economics vs New Economics

Human Cognition Model

• Each additional hour of thinking = full hourly cost
• Marginal cost scales linearly
• Must choose: sample or go bankrupt
• 50 transactions checked, 10,000 not
• "We can't afford to analyze everything"

AI Cognition Model

• Infrastructure cost is fixed; usage is variable
• Each additional task = marginal inference cost only
• Checking 10,000 vs 50 costs ~same
• "Once plumbing is in, marginal cost → cents"
• "We can afford to analyze everything"

This is the promise from Chapter 2 made real: marginal cost per "thinking task" trends toward cents instead of dollars. Version 2 is where you cash that promise.

What Makes Version 2 Work

Three conditions consistently predict success:

High volume. The more cognitive tasks, the better the ROI. If you're only analyzing 50 things, the infrastructure overhead doesn't justify AI. If you're analyzing 50,000 things, the math tilts heavily in AI's favor.
Latency flexibility. Batch or async workflows preferred. Overnight analysis, scheduled reports, queue-based processing. When you remove the real-time pressure, AI can apply depth that humans can't match at scale.
Errors are reversible or caught by humans. You're not betting the company on a single AI decision. Either the work is low-stakes, or there's a verification step, or mistakes get caught downstream.

Notice the pattern: none of these replaced humans. They amplified what humans could oversee. The analyst still reviews flagged transactions. The billing specialist still signs off on complex cases. The CS team still handles escalations. But the cognitive reach of each human expanded 10-100x.

"We can throw 100x more thinking at our problems than we used to, for roughly the same spend. That's not automation—that's capability amplification."

Version 3: Thinking That Was Previously Impossible

Now we reach the frontier—the work that organizations don't even attempt today because it would be economically or politically insane.

Version 3 isn't about accelerating current work. It's about making entirely new categories of work rational to attempt for the first time.

Version 3: The Frontier

What it is: Work that was never feasible before because coordination overhead, calendar time, or organizational patience killed it

Why it's possible now: AI doesn't have calendar time constraints, meeting fatigue, political navigation requirements, or coordination overhead that scales with team size

Value unlock: Not incremental improvement—entirely new competitive capabilities that were structurally impossible before

Examples: Hyper sprints (Chapter 6), marketplace-of-one personalization (Chapter 7), continuous strategic sensing, exhaustive scenario planning

The Human Limits Version 3 Bypasses

Why are certain kinds of thinking structurally impossible with human-only teams? Three constraints that don't apply to AI:

The Coordination Tax

Research from organizational behavior shows a brutal pattern: as team size increases, productive output per person plummets.

5-person team

Efficient. Everyone knows what everyone else is doing. Decisions happen in one room.

10-person cross-functional team

Getting messy. Need alignment meetings. Different functions have conflicting priorities. 50% of time goes to coordination.

25-person project team

Organizational nightmare. More meeting time than work time. Decisions optimized for consensus, not quality. "Everyone only tolerates a certain amount of ideas because everyone's just keen to get the job done."

50-100 person initiative

Never attempted. No one even tries because it's organizationally impossible. The coordination cost exceeds any plausible benefit.

AI multi-agent systems don't have this problem. A 100-agent system doesn't need meetings, politics, or consensus-building. Coordination overhead stays flat.

The Calendar Time Trap

Large strategic initiatives take months with human teams. Not because the analysis itself requires months—but because of scheduling, coordination, iteration cycles, and political navigation.

A typical pattern for a major strategic decision:

• Week 1-2: Frame the problem, align on objectives
• Week 3-6: Parallel workstreams gather data
• Week 7-8: Consolidate findings, inevitable gaps emerge
• Week 9-10: Second round of analysis
• Week 11-12: Draft recommendations, socialize with stakeholders
• Week 13-14: Revisions based on feedback
• Week 15-16: Final presentation, decision

That's four months for a decision that, in compute time, represents maybe 40-60 hours of actual analytical work. The rest is waiting—for meetings, for feedback, for availability, for political alignment.

AI doesn't wait. What if that same analytical depth happened overnight?

The Political Acceptability Filter

Here's the uncomfortable truth about large organizational decisions: they don't optimize for the best answer. They optimize for the politically acceptable answer.

"Those big teams working on something, they're never tasked with finding the best answer. It's always an acceptable answer. And what's an acceptable answer that the senior management will swallow? It's based on a lot of experience, sure, but it's experience of getting things past the senior managers."

Committee-think optimizes for consensus under time pressure. Bold ideas get watered down. Risky options get rejected not because they're wrong, but because no one wants to be the one who championed the failed initiative. The final recommendation is what the group can agree on, not necessarily what the analysis suggests.

AI search doesn't have that constraint. It can explore the full solution space—including options humans would self-censor for political reasons—and surface them with clear reasoning about trade-offs.

What Becomes Possible: Version 3 Examples

Hyper Sprints (Chapter 6)

Replace months of cross-functional committees with overnight AI exploration of thousands of strategic options, complete with reasoning trails and rejected alternatives. Human experts review in the morning and redirect for the next sprint.

Marketplace of One (Chapter 7)

Shift from segment-based strategies to per-customer personalization—offers, pricing, service levels, communications—economically rational for the first time because AI can manage the combinatorial complexity humans can't.

Continuous Strategic Sensing

Always-on analysis of all customer interactions, market signals, competitive moves, and internal operations—spotting emerging patterns no human analyst would catch because no one reads everything.

Exhaustive Scenario Planning

Stress-test every strategic option against hundreds of future scenarios, document why each succeeds or fails under which assumptions—analysis that would take a McKinsey team six months, delivered overnight.

The Version 3 Question

If you only ask "Where do we waste human thinking time?", you're stuck in Versions 1 and 2.

The Version 3 question is different:

"What's one project we've never attempted because the coordination overhead was too high? That might be our Version 3."

What strategic analysis have you not done—not because it wouldn't be valuable, but because assembling a 50-person team for six months was organizationally insane?

What per-customer customization have you not offered—not because customers wouldn't value it, but because managing that complexity manually would be impossible?

What patterns in your data have you not looked for—not because they wouldn't be revealing, but because no one has time to read everything?

Those are your Version 3 opportunities.

The Progression Table

Here's how the three versions compare across key dimensions:

Dimension	Version 1 (Automation)	Version 2 (Scale)	Version 3 (Frontier)
What changes	Same work, fewer people	Same problems, 100x more thinking	New problems become rational
Value source	Cost reduction	Quality & coverage improvement	New capability creation
Success rate	15-30% (unless right quadrant)	60-75%	Unknown (frontier)
ROI range	Often negative → $2/dollar	$3.70-$10.30/dollar	Potentially 50x+ (strategic moats)
Time horizon	Immediate (months)	Near-term (6-18 months)	Strategic (2-5 years)
Competitive impact	Parity (everyone automates)	Advantage (better execution)	Moat (structural impossibility for others)
Org change required	Low	Medium	High (new operating models)
Example	Invoice processing chatbot	Fraud checking all transactions	Overnight strategic hyper sprints

Where Most Organizations Are Stuck

The uncomfortable reality: most organizations are stuck in Version 1 because it's easiest to imagine. "AI, but inside the shapes of our current processes."

They look at their org chart, identify tasks humans currently do, calculate costs, and ask: "Can AI do this cheaper?"

That framing guarantees you miss Version 3 entirely—and explains why so many AI projects deliver underwhelming results.

Two Ways to Think About AI Projects

❌ The Version 1 Mindset

• "Where do humans waste time?"
• "What tasks can we automate?"
• "How much can we save in headcount?"
• Leads to: chatbots, RPA, task automation

Outcome: 70-85% failure rate, underwhelming ROI, "AI doesn't work for us"

✓ The Version 2-3 Mindset

• "What if cognition was essentially abundant?"
• "What analysis are we not doing because it's too expensive?"
• "What was structurally impossible before?"
• Leads to: exhaustive analysis, continuous sensing, new capabilities

Outcome: $3.70-$10.30 returns per dollar, strategic differentiation, compounding competitive advantages

The shift from Version 1 to Version 3 thinking requires asking a fundamentally different question.

What This Means for Your AI Strategy

The three-version framework gives you a lens to evaluate any AI project:

Version 1 Projects (Automation)

Treat with caution. Only proceed if:

• You're in bottom-right quadrant (high latency tolerance, low error cost)
• Volume is massive (10,000+ instances)
• You're not competing with humans in contexts optimized for humans

Warning sign: If your pitch is "replace this person/team", expect 70%+ failure risk.

Version 2 Projects (Scale)

High confidence. Invest here. Look for:

• Cognitive work you currently sample (now check everything)
• Batch/async workflows with time flexibility
• Opportunities to amplify human oversight 10-100x

Success pattern: $3.70-$10.30 return per dollar, humans stay in loop at critical points.

Version 3 Projects (Frontier)

Strategic bets. Start small, learn fast. Ask:

• What analysis have we never done because coordination overhead was too high?
• What per-customer customization can't we offer because complexity is unmanageable?
• What strategic options have we never explored because it would take 50 people six months?

Strategic value: If successful, creates competitive moats competitors can't copy. Worth exploring even with uncertainty.

The Path Forward

Most organizations should pursue a portfolio approach:

10-20% Version 1: Low-risk automation wins in proven contexts (Klarna-style ticket handling)
60-70% Version 2: Scale plays with clear ROI—checking everything instead of sampling, amplifying human oversight
10-20% Version 3: Strategic exploration of previously impossible work—hyper sprints, marketplace-of-one, continuous sensing

The Version 1 projects keep the CFO happy with near-term savings. The Version 2 projects deliver measurable ROI and build organizational capability. The Version 3 projects create strategic separation from competitors.

"If you only look for wasted human thinking, you're missing the greenfield where there was no human thinking at all—because it was never feasible. Once you show executives that second territory, project ideas shift from 'let's bolt AI onto X' to 'what would we do if cognition was essentially abundant?'"

In the next two chapters, we'll make Version 3 concrete with detailed examples: hyper sprints that replace committee-think with systematic AI search (Chapter 6), and marketplace-of-one personalization that was never economically rational before (Chapter 7).

But first, let's be clear about what we've established:

Chapter 5 Key Takeaways

• Version 1 (automation) has the highest failure rate—70-85%—because it competes with humans in contexts optimized for humans
• Version 2 (scale) delivers $3.70-$10.30 per dollar by applying 100x more thinking to existing problems—check everything instead of sampling
• Version 3 (frontier) enables work that was structurally impossible before—strategic analysis that would require 50 people for six months can happen overnight
• Most organizations are stuck at Version 1 because they only ask "Where do we waste human time?" instead of "What was never feasible before?"
• Klarna's $40M win shows Version 1 can work—but only in batch/ticket contexts with latency tolerance, not live chat
• The marginal cost revolution: once infrastructure is in place, each additional "thinking task" costs cents, not dollars—changes what's rational to attempt

Hyper Sprints: Replacing Committee-Think

Picture the typical enterprise strategic project: ten cross-functional people, three months of meetings, and a PowerPoint deck that everyone can live with. Now imagine a different approach—one where thousands of possibilities are explored overnight, leaving humans to do what they do best: make the final call with full visibility into what was considered and why.

TL;DR

• Committee-think optimises for consensus under time pressure, not for finding the best answer
• Hyper sprints use AI to systematically explore thousands of possibilities overnight
• Extended thinking and multi-model councils achieve 97% accuracy vs 80% for single models
• Tasks that took cross-functional teams three weeks become 200+ iterations completed between midnight and 6am

The Committee-Think Problem

We've all seen how big decisions get made in organizations. A cross-functional team is assembled—representatives from finance, operations, marketing, IT, maybe legal. There's a series of workshops and meetings. Stakeholders negotiate. Political navigation happens throughout. The goal, whether stated explicitly or not, is rarely to find the best answer. It's to find an acceptable answer.

"What's an acceptable answer that senior management will swallow? Based on experience of getting things past managers, not exploration of what's actually optimal."

This isn't a criticism of the people involved—it's a structural constraint. Research on group decision-making reveals the mechanisms at play:

The Groupthink Mechanisms

Groupthink is a tendency to avoid critical evaluation of ideas the group favors. A poor or defective group decision influenced by groupthink is characterized by a failure to consider other, more favorable alternatives before reaching a conclusion.

— NYU Steinhardt, Groupthink as System

Research identifies four dimensions of defective decision-making:

1. Failure to create contingency plans

The group settles on one path without planning for what could go wrong

2. Lack of information search

Limited exploration of data that might challenge the emerging consensus

3. Biased assessment of costs and benefits

Evaluations skewed toward the preferred option

4. Incomplete consideration of options

Premature narrowing to a small set of alternatives

The outcome is predictable: committees aren't tasked with finding the best answer—they're tasked with finding an answer that key stakeholders will accept. The process becomes political theater, a negotiation between prior preferences rather than a genuine exploration of possibilities.

The Hyper Sprint Alternative

What if you could take a problem you'd normally hand to a cross-functional group for two to three months and replace it with something fundamentally different? Not a committee meeting, but a search process:

→ Thousands of AI calls overnight exploring multiple frames, scenarios, and constraints
→ Full audit trail of what was considered, rejected, and why
→ Human experts review in the morning and redirect the search based on insights
→ Politics happen after seeing the full landscape, not during exploration

The Chess Engine Analogy

Here's the crucial distinction: you're not asking AI to magically know the answer. You're asking it to systematically explore more possibilities than humans would have time for—like a chess engine exploring move trees.

In chess, the human sets the objectives (win the game), defines the constraints (legal moves), and provides the evaluation criteria (piece values, positional strength). The engine explores huge chunks of the possibility space. The result? Move sequences that humans would never have time to consider.

"Humans are terrible at exploring large idea spaces under time and social pressure. AI is good at it, as long as humans shape the scoring and constraints."

— Scott Farrell

The same principle applies to strategic decisions: AI doesn't replace human judgment—it expands the search space so humans can judge from a position of visibility rather than guesswork.

Committee-Think vs Hyper Sprint

Aspect	Committee-Think	Hyper Sprint
Optimises for	Consensus, political acceptability	Search coverage, idea quality
Time to explore	Constrained by meeting schedules	Unconstrained (overnight runs)
Ideas considered	Whatever fits in PowerPoint	Thousands of options explored
Audit trail	Sparse meeting notes	Full reasoning trail preserved
Politics	During exploration (constrains thinking)	After seeing landscape (informed)
Outcome	Acceptable to stakeholders	Best option identified, then negotiated

Extended Thinking: The Enabling Technology

This kind of systematic exploration is only possible because of a fundamental shift in how AI systems work. Traditional language models optimize for speed—they generate the first plausible answer. But newer systems like OpenAI's o1 and Claude's extended thinking mode work differently.

Inference-Time Compute

o1 is trained with reinforcement learning to 'think' before responding via a private chain of thought. The longer it thinks, the better it does on reasoning tasks. This opens up a new dimension for scaling. We're no longer bottlenecked by pretraining. We can now scale inference compute too.

— IESE, Chain of Thought Reasoning Breakthrough

What this means in practice: instead of optimizing models to answer instantly, we can give them compute budget to think through the problem. The AI explores multiple paths, evaluates trade-offs, considers edge cases—all before committing to an answer.

Translation: a smaller model given time to think can outperform a massive model forced to answer immediately. This inverts the economics. Instead of needing exponentially more training compute to improve performance, you can allocate inference-time compute—which is cheaper and scales linearly.

Extended Thinking in Practice

Extended thinking is a feature of both Claude Opus 4.5 and Claude Sonnet 4.5 that enables the model to expose its internal reasoning process in 'thinking' content blocks. Unlike standard mode—which optimizes for brevity and speed—extended thinking allocates more compute and context to produce deeper, multi-step reasoning workflows, crucial for complex code refactoring, strategic planning, and legal analysis.

— Comet API, How to Use Claude Extended Thinking

The trade-off is latency. Extended thinking prioritizes reasoning quality over raw speed. For hyper sprints, that's exactly what you want—overnight batch runs where thoroughness matters more than instant responses.

Discovery Accelerator Architecture

So how do you actually implement a hyper sprint? The architecture we've developed—called a Discovery Accelerator—uses three layers to create transparent, multi-dimensional reasoning that single-model systems can't replicate.

Layer 1: Director AI

Role: Orchestration, framing, curation, adaptation

Function: Sets objectives, defines constraints, curates results, redirects search based on human feedback

Example: "We need a market entry strategy for Southeast Asia. Consider regulatory, competitive, and operational constraints. Prioritize speed to market but flag high-risk options."

Layer 2: Council of Engines

Role: Specialized models with diverse perspectives

Function: Different models debate approaches—one optimizes for cost, another for speed, another for risk mitigation

Example: Operations brain flags supply chain constraints, revenue brain identifies monetization paths, risk brain surfaces regulatory blockers

Layer 3: Chess-Style Reasoning Engine

Role: Systematic exploration, rebuttal generation, pruning

Function: Explores idea combinations, generates counter-arguments, prunes dominated options, documents reasoning

Example: Explores ~100 strategic nodes/minute—deliberately paced for observability rather than raw speed—preserving full audit trail of what was considered and why options were rejected

The Feedback Dimension: Stream-of-Consciousness Relay

Role: Continuous reasoning relay between layers

Function: The stream-of-consciousness output from Layer 3 feeds back into Layers 1 and 2. The Director sees emerging patterns and adjusts framing. The Council receives new evidence to update their perspectives. Each layer's reasoning enriches the others in real-time.

Why it matters: Traditional pipelines are one-directional—data flows down, results flow up. This architecture creates a reasoning loop: the chess engine's exploration surfaces insights that reshape how the Director frames the problem and how the Council weighs trade-offs. A rejected path in Layer 3 might reveal a constraint the Council hadn't considered, triggering re-evaluation across all specialists.

Example: Chess engine explores "aggressive pricing" path → discovers regulatory constraint in market X → relays finding back → Risk brain escalates concern → Director reframes objective to "sustainable entry" → Council re-debates with new constraint → Chess engine explores revised solution space

Why Multi-Model Councils

The diversity advantage isn't theoretical—it's measurable:

Multi-Model Performance:

✓ 97% accuracy with multi-model councils vs 80% for single models
✓ Implements Andrew Ng's four agentic design patterns: Reflection, Tool use, Planning, Multi-agent collaboration
✓ Diversity advantage is proven, not theoretical

Source: LeverageAI, Discovery Accelerator Architecture

Reasoning-Guided Search vs Traditional Search

The key architectural innovation is how search integrates with reasoning. Traditional AI search works like this:

❌ Traditional Approach: Search-Guided Reasoning

User asks a question
Search the web for relevant information
LLM summarizes what was found
Present summary to user

Problem: You get what the web happens to say about a broad topic, not targeted validation of specific strategic ideas. Generic information, not decision-relevant insights.

✓ Discovery Accelerator: Reasoning-Guided Search

Chess engine generates specific strategic idea
Generate targeted research questions FOR THAT IDEA
Search for validation or contradiction of specific claims
Feed findings back into scoring and exploration

Advantage: You search for what validates or challenges specific ideas, not generic information. Every search query is hypothesis-driven.

Source: LeverageAI, Reasoning-Guided Search

Use Cases for Hyper Sprints

Where does this approach make the most sense? Anywhere humans are currently forced to prematurely narrow the solution space due to time or cognitive constraints.

Strategic Planning

Portfolio Selection & Scenario Planning

"Which of these 200 possible projects are we under-valuing and why? What assumptions about the future make each one succeed or fail?"

Example: Generate 500 portfolio combinations, stress-test against 20 different market scenarios, document which assumptions drive each outcome.

Workforce & Network Design

Rostering problems with hundreds of constraints—skills, availability, costs, travel logistics, regulatory requirements.

Example: Explore 10,000 schedule permutations overnight, flag optimal solutions and document trade-offs between cost, coverage, and compliance.

The Non-Prune Advantage

Here's what changes when compute is abundant: you can afford to not prematurely prune the solution space.

Previously, strategic planning meant narrowing down early just to stay sane. You'd start with 50 ideas, immediately cut to 10 "finalists," and then spend months evaluating those 10. The problem? The best option might have been in the 40 you cut before doing any real analysis.

The New Pattern:

→ Generate many candidate strategies without premature filtering
→ Stress-test all of them against different future scenarios
→ Document why each fails or succeeds under which assumptions
→ Let humans decide with full visibility into the trade-offs

Engineering Optimisation

Resource Allocation with Complex Constraints

Budget allocation across departments with interdependencies, capacity limits, strategic priorities, and political realities.

Example: Model 5,000 allocation scenarios, identify Pareto-optimal solutions, show which strategic goals are in tension and require human trade-off decisions.

Project ROI Comparison

Compare dozens of potential initiatives across multiple metrics—financial return, strategic alignment, risk, time to value, resource requirements.

Example: Score 100 projects against 15 weighted criteria, generate sensitivity analysis showing how rankings change under different strategic priorities.

The Bottom Line

Committee-think isn't a failure of people—it's a failure of time and cognitive constraints. When you have limited meeting hours and finite human attention, you optimize for consensus and acceptability, not for finding the best answer.

Hyper sprints remove those constraints. AI doesn't sleep, doesn't need meetings, and doesn't suffer from groupthink. It explores thousands of possibilities systematically, preserves the full audit trail, and lets humans make decisions from a position of visibility.

"The chess analogy holds: AI explores, humans set objectives and decide. The game hasn't changed—we've just expanded the number of moves we can consider before committing."

— Scott Farrell

Chapter Takeaways

✓ Committee-think optimizes for consensus under time pressure, not for finding the best answer. Groupthink, political navigation, and premature pruning are structural constraints, not people failures.
✓ Hyper sprints optimize for search coverage and idea quality. Thousands of possibilities explored overnight, full audit trail preserved, politics happen after seeing the landscape.
✓ Extended thinking enables deeper reasoning when given inference-time compute. Small model + thinking time can outperform 14× larger model + instant response.
✓ Multi-model councils achieve 97% accuracy vs 80% single-model. Diversity advantage is proven—specialized models debate from different perspectives (operations, revenue, risk, culture).
✓ The chess analogy holds: AI explores, humans set objectives and decide. You're not asking AI to know the answer—you're asking it to systematically explore more possibilities than humans have time for.
✓ Tasks that took cross-functional teams weeks become overnight runs with 200+ iterations. AI agents don't sleep, don't need coordination meetings, and don't suffer from groupthink.

Marketplace of One

Why do we segment customers? Because treating each one individually was too expensive. That constraint has changed.

The Historical Constraint

Why Segmentation Exists

For decades, the economics of marketing and service delivery forced a fundamental compromise: we segment customers into groups and design around the "average customer in segment X." It wasn't the ideal approach—it was the only feasible approach.

The reasoning was sound:

• Too cognitively expensive to treat every customer individually
• Policies designed around demographic group averages
• Campaigns built for standardised segments
• Support flows optimised for operational efficiency

What Gets Lost

The trade-off was predictable and painful:

• Individual preferences flattened to segment averages
• Outliers poorly served by standardised approaches
• One-size-fits-most becomes one-size-fits-none
• Opportunities for personalised value creation left on the table

$1 Trillion

The estimated value shift from standardisation to personalisation across US industries alone.

Companies that grow faster drive 40% more revenue from personalisation than their slower-growing counterparts. More than 70% of consumers now consider personalisation a basic expectation—not a premium feature.

McKinsey & Company

The Economic Shift

The constraint that justified segmentation—the prohibitive cost of individual treatment—has fundamentally changed. Research from McKinsey quantifies what many companies are beginning to discover: personalisation at scale represents one of the largest value-creation opportunities in modern business.

The Revenue Impact

The numbers are compelling:

✓ Personalisation typically drives 10–15% revenue lift
✓ Company-specific lift ranges from 5–25%, driven by sector and execution capability
✓ Shifting to top-quartile performance would generate over $1 trillion in value across US industries

McKinsey & Company

"AI doesn't optimise for average—it adapts to context. Every interaction can be unique. Every path can be recalculated. Every response can be personalised."

The Cost Structure Flip

What has changed is fundamental: the economics of personalisation have inverted.

Previously: Customisation was expensive. Manual effort scaled linearly with customer count. Individual treatment required prohibitive human resources.

Now: Recomputing per-customer recommendations costs less than maintaining rigid, segment-based rules that inevitably require constant exception handling, manual overrides, and customer frustration.

The human couldn't manage the combinatorial complexity of thousands of individual customer profiles, each with unique history, preferences, and context. AI can track per-customer context, state, and behavioural patterns—and remain coherent.

What Becomes Possible

Per-Customer Design

When the constraint of cognitive overhead disappears, entirely new design patterns become rational:

Marketplace of One: Use Cases

Marketing

• Per-customer campaign messaging
• Dynamic creative optimisation
• Individualised timing and channel selection
• Offer personalisation beyond segment rules

Service

• Personalised support flows
• Individual escalation thresholds
• Communication preference matching
• Proactive outreach based on individual patterns

Pricing & Risk

• Dynamic pricing based on individual behaviour
• Personalised risk assessment
• Custom terms and conditions
• Individual credit decisions

Mass Personalisation vs Mass Customisation

The distinction matters:

• Mass customisation: Caters to the needs of large user cohorts and their special requirements
• Personalisation: Focuses on the needs of a particular individual

With advanced AI technology, achieving an intimate understanding of individual customer needs has become both realistic and financially promising.

Intellias

The Results Data

Hyper-Personalisation Performance (2025)

Businesses leveraging AI-driven personalisation at scale are reporting dramatic performance improvements:

62%

Increase in engagement rates

80%

Improvement in conversion rates

Compared to traditional segment-based approaches

AI Magicx, 2025

What GenAI Enables

The capabilities that make marketplace-of-one feasible:

→ Real-time data analysis across vast customer datasets
→ Aspiration and behaviour identification—not just stated needs
→ Proactive trend anticipation and challenge resolution
→ True individualisation—moving beyond traditional customer segmentation

Always-On Sense-Making

One of the most powerful applications of marketplace-of-one thinking extends beyond customer-facing interactions:

• Continuously reading tickets, emails, chats, documents, and logs
• Spotting emerging problems, patterns, and opportunities as they develop
• Proposing hypotheses: "It looks like X might be happening because of Y"
• Acting as a standing "organisational brain" that never gets bored

You're not going to assign a human team to read everything, all the time—they'd mutiny. But an AI system can be a persistent sense-making layer across your organisation that never fatigues.

This Is Version 3

Not Automation

Marketplace of one represents Version 3 AI value creation:

× Not Version 1: We're not doing old work faster
× Not Version 2: We're not just applying more thinking to existing problems
✓ Version 3: We're creating new work that was never feasible—a new class of product and service design

What It Requires

Marketplace-of-one isn't plug-and-play. It requires genuine capability building:

Data Infrastructure

Systems to track, store, and retrieve individual customer context at scale. Not just transaction history—behavioural patterns, preference signals, and interaction context.

AI Systems

Models capable of processing per-customer recommendations in real-time or near-real-time. The ability to compute thousands of individualised responses efficiently.

Business Processes

Operational workflows that can receive and act on individual recommendations. Systems flexible enough to handle per-customer variation without breaking.

Governance

Frameworks for personalised decisions at scale. Ensuring fairness, compliance, and auditability when every customer receives unique treatment.

The Strategic Question

"What would we design if we had a smart assistant assigned to every customer—and every employee?"

The answers to that question represent the marketplace-of-one opportunity. They're the products, services, and experiences you currently dismiss as "too complex to manage." They're Version 3.

Key Takeaways

• Segmentation exists because individual treatment was too expensive—that constraint has changed
• $1 trillion opportunity in personalisation across US industries (McKinsey)
• Cost structure has flipped: per-customer computing now cheaper than one-size-fits-none
• Results: 62% higher engagement, 80% better conversion (AI Magicx)
• Marketplace of one = Version 3: new work that wasn't feasible before

AI as Cognitive Exoskeleton

The pattern that works across all three versions of AI value isn't about replacement. It's about amplification.

AI does the pre-work. Humans own the moment.

TL;DR

• Medical diagnostics improve from 72% to 80% sensitivity with AI assistance—not replacement, amplification
• Multi-agent orchestration delivers 90.2% improvement over single-agent systems in research tasks
• The cognitive exoskeleton pattern: AI saturates pre-work, human owns judgment and relationships
• Token economics: multi-agent systems use 15x more tokens, so deploy on high-value tasks only

The Mental Model Shift

The difference between Version 1 and Version 3 thinking comes down to where you place AI in the workflow.

From Brittle Autonomy to Robust Augmentation

Old Mental Model

• "AI answers the customer"
• Fragile, one-shot, high failure rate
• Fighting the latency-accuracy trade-off
• 72% say chatbots waste time
• Human escalation = system failure

New Mental Model

• "AI does everything leading up to the moment where the human answers"
• Robust, augmentative, plays to strengths
• Human keeps judgment and relationships
• 90.2% improvement with orchestration
• Human escalation = system design

"The real power of AI lies in amplification, not automation. It doesn't remove human input—it multiplies its impact."

— Anthony Coppedge, AI as Exoskeleton

The Pre-Work Pattern

Instead of asking AI to handle the customer interaction, ask it to prepare the human who will.

What AI Can Do Before the Human Acts

When a customer message arrives, AI can:

→ Mine CRM for truly relevant past interactions and summarize context
→ Infer what the customer probably cares about based on history and current message
→ Pull knowledge base articles, policies, and similar resolved cases
→ Surface a rich cockpit with context, suggested actions, draft responses, and risks to watch for

What This Gives the Human

Faster

Less clicking through screens and searching through documentation. The human agent sees a prepared dashboard instead of scattered data sources.

More Accurate

Better context than they'd find alone. AI can process thousands of past interactions to surface the three that actually matter for this customer's situation.

Still in Control

The human owns judgment and relationship handling. They see the AI's suggestions as inputs, not commands. They bring tacit knowledge, social intelligence, and recovery from error.

The Evidence for Augmentation

This isn't theory. The medical field provides the cleanest evidence that augmentation outperforms replacement.

Medical Results

Medical professionals using AI-enhanced diagnostics demonstrate significant performance improvements. Studies show AI assistance increasing diagnostic sensitivity from 72% to 80% and specificity from 81% to 85% for fracture detection, with 91.3% sensitivity for lesion detection compared to 82.6% for human-only interpretation. AI reduces diagnostic time significantly.

Source: EY, Human-Machine Economy report

Those numbers tell the story: AI alone isn't better than humans alone. But AI assisting humans beats either party working solo.

The Physical Exoskeleton Parallel

AI-powered exoskeletons and wearable robotics can augment human strength and endurance. The Exia model by German Bionic is the first AI-augmented exoskeleton: it captures billions of biomechanical data points, learns from user movements, and delivers up to 38 kg of adaptive lifting assistance.

Source: e-Novia, Human Augmentation Technologies report

The physical exoskeleton doesn't replace the human worker. It amplifies their capability. The same principle applies to cognitive work.

Brain Cache Research

Brain Cache, a Generative AI-powered cognitive exoskeleton acting as a second brain for humans, achieves cognitive augmentation through three mechanisms: externalizing biological memory, structuring knowledge, and activating insights. By creating a mirror system that externalizes, reorganizes, and reactivates knowledge in rhythm with biological learning cycles, we enable humans to consciously participate in their own cognitive evolution.

Source: MIT, GenAI & HCI Conference Paper

Multi-Agent Orchestration

The augmentation pattern doesn't stop at one AI assisting one human. It extends to teams of AI agents coordinating to amplify a single human's capability.

The Performance Data

We found that a multi-agent system with Claude Opus 4.5 as the lead agent and Claude Sonnet 4.5 subagents outperformed single-agent Claude Opus 4.5 by 90.2% on our internal research eval. Our internal evaluations show that multi-agent research systems excel especially for breadth-first queries that involve pursuing multiple independent directions simultaneously.

Source: Anthropic, Multi-Agent Research System

That 90.2% improvement isn't a typo. It's the difference between one AI trying to do everything and a coordinated team with specialized roles.

Enterprise Results

The pattern scales beyond research tasks. Enterprises deploying orchestrator-worker multi-agent patterns in sales, finance, and support see up to 30% increases in process efficiency, with error rates reduced by up to 25%.

How It Works

A central orchestrator agent uses an LLM to plan, decompose, and delegate subtasks to specialized worker agents or models, each with a specific role or domain expertise. This mirrors human team structures and supports emergent behavior across multiple agents.

Multi-Agent Architecture Pattern

Orchestrator Agent

• Receives task from human
• Decomposes into specialized subtasks
• Delegates to worker agents
• Synthesizes results back to human

Worker Agents

• Each has domain expertise (sales, legal, technical, etc.)
• Execute narrow, well-defined tasks
• Return results to orchestrator
• Can be different models optimized for cost/performance

Human Role

• Sets strategic direction
• Reviews synthesized options
• Makes final judgment calls
• Handles relationships and accountability

Token Economics Reality

In our data, agents typically use about 4× more tokens than chat interactions, and multi-agent systems use about 15× more tokens than chats. For economic viability, multi-agent systems require tasks where the value of the task is high enough to pay for the increased performance.

Source: Anthropic, Token Economics Research

This is the trade-off: 15x token cost for 90.2% performance improvement. It makes economic sense for high-stakes decisions (M&A analysis, strategic planning, complex technical architecture). It doesn't make sense for routine email summaries.

Where This Pattern Applies

The cognitive exoskeleton pattern works anywhere humans face high-stakes, time-sensitive interactions that benefit from exhaustive preparation.

Live Customer Interactions

→ Sales calls: AI surfaces account history, suggested talking points, competitor intel, pricing options
→ Support: AI prepares context, root cause analysis, suggested solutions, escalation triggers
→ Negotiations: AI analyzes options, precedent deals, risk scenarios—human makes judgment calls

Internal Workflows

→ Approvals: AI prepares cost-benefit analysis, risk assessment, compliance check—human signs off
→ Risk reviews: AI surfaces patterns across thousands of transactions—human makes assessment
→ Strategic decisions: AI explores 100+ options systematically—human chooses direction

Professional Services

→ Medical consults: AI prepares differential diagnosis context, relevant research, similar cases
→ Legal work: AI surfaces relevant precedent, contract language, risk flags—lawyer makes judgment
→ Financial advice: AI prepares portfolio analysis, scenario modeling, tax implications

The Common Thread

In all these cases:

• The interaction is high-stakes and time-sensitive
• AI saturates the pre-work and side-work
• The human owns the moment of judgment
• Relationships and accountability stay with the human

Not Brittle Autonomy

The cognitive exoskeleton pattern succeeds where chatbot autonomy fails because it isn't forcing AI into a one-chance-to-be-perfect role.

What Each Party Brings

What Humans Bring

• Tacit knowledge
• Social intelligence and relationship handling
• Judgment under ambiguity
• Recovery from error
• Accountability

What AI Brings

• Exhaustive search and retrieval
• Pattern matching at scale
• Consistent application of criteria
• Parallel processing across data sources
• Tireless attention to detail

Each party plays to their strengths. AI doesn't need to be perfect because it isn't making the final call. Humans don't need to manually search through thousands of records because AI has already done that work.

The result: faster, more accurate, and more robust than either party working alone.

Chapter Takeaways

→ Shift from "AI answers" to "AI does everything leading up to the answer"
→ Medical evidence: 72% to 80% diagnostic sensitivity with AI assistance—8 percentage point improvement from augmentation
→ Multi-agent orchestration delivers 90.2% improvement over single-agent systems
→ Token cost reality: multi-agent = 15x chat cost, so deploy on high-value tasks only
→ The exoskeleton pattern applies everywhere: sales, support, medical, legal, strategic planning
→ Not replacement—amplification. AI brings exhaustive search and pattern matching; humans bring judgment and accountability

The Right Questions

You've seen the deployment matrix. You understand the three versions of AI value. You know why chatbots fail and where batch processing wins. Now comes the implementation reality check.

Because "$1 per hour AI" is a fantasy. And the questions most organisations ask point them directly toward the 95% failure rate.

The Real AI On-Costs

Per-token pricing creates the illusion that AI is cheap. GPT-4o costs $10 per million output tokens. That sounds like pennies. But AI doesn't run on tokens alone—it requires an ecosystem to operate. The true costs look less like hiring a contractor and more like hiring a department.

Toolchain & Infrastructure

What you need: Retrieval systems (vector stores for RAG), orchestration platforms to coordinate multi-step workflows, observability and logging infrastructure to track what's happening.

Reality check: These aren't optional. Without them, you're flying blind.

Operations & Monitoring

What you need: Logs, alerts, dashboards showing token consumption and costs, anomaly detection for model drift, performance monitoring across the stack.

Cost example: In research contexts, monthly spend on evaluation and tooling ranges from $31,300 to $58,000. Enterprises need to alert if budgets deviate by more than 5%.

Governance & Risk

What you need: Policies defining which models can be used and when, approval workflows for high-stakes deployments, audit records for regulators and compliance teams.

Reality check: Risk officers start sweating when they can't trace how a decision was made.

Model & Prompt Maintenance

What you need: Ongoing prompt tuning as models and products evolve, model version management and testing, workflow updates as business processes change.

Reality check: Workflows decay. Products change. Prompts that worked last quarter may not work next quarter.

Change Management

What you need: Training staff on new workflows, updating standard operating procedures, handling organisational politics when AI changes how people work.

Reality check: Technology is the easy part. People are the hard part.

This is why successful AI organisations don't just buy software—they build capabilities. And that takes investment in infrastructure, process, and people.

What Success Actually Looks Like

The 70/30 Split

Organisations achieving value from AI invest 70% of their AI resources in people and processes, not just technology. Change management is as important as model selection. The companies treating AI as purely a software purchase are the ones showing up in the failure statistics.

Technology is 30% of the solution. The other 70% is people, process, governance, and organisational change.

ROI Timeline Reality

Most organisations achieve satisfactory ROI within 2-4 years—much longer than typical 7-12 month software payback periods. Companies that moved early into GenAI adoption report $3.70 in value for every dollar invested, with top performers achieving $10.30 returns per dollar. But that return takes patience. Quick wins need to fund a longer journey.

High Performer Characteristics

AI high performers share common patterns:

• They commit 20%+ of digital budgets to AI
• They implement human oversight for critical applications
• They set growth or innovation as objectives, not just efficiency
• They redesign workflows rather than bolting AI onto existing processes

Half of AI high performers intend to use AI to transform their businesses, and most are redesigning workflows. They're not asking "where can we swap humans for AI?" They're asking "what becomes possible when thinking scales?"

The Risks Beyond Hallucinations

Hallucination gets all the press. The model makes something up. Everyone panics. Risk committees demand guardrails. But hallucination is just a symptom—a model making something up once, in a way that's visible.

The deeper risks are systemic, silent, and far more dangerous.

The Real AI Risks

Systemic Wrongness

An AI system faithfully executing a bad spec 24/7 at scale. Working off stale or partial data consistently. Delivering wrong answers reliably, not randomly.

Unobservable Decisions

No logs, no traces, no reasoning trails. You know what it answered, but not why. When something goes wrong, you have no forensic path to follow.

Accountability Blur

Is the bug in the model, the prompt, the retrieval system, the business logic, or the governance framework? Risk officers start sweating when they can't answer that question.

Silent Drift

Models get updated. Data changes. Prompts evolve. Behaviour slowly shifts over time. No one notices until embarrassment hits a regulator or Twitter.

"Hallucination is just a model making something up once. The real risk is an AI system being wrong consistently and invisibly for months."

This is why observability isn't optional. This is why governance frameworks matter. This is why the on-costs are real. You're not just deploying a model—you're deploying a system that needs monitoring, maintenance, and accountability structures.

The Three Questions

Most organisations start AI projects by asking the wrong question. Here's how the progression should actually work.

Decision Path: From Wrong to Best

❌ Wrong Question

"Where can we put a chatbot?"

• Technology-first thinking
• Ignores the deployment matrix entirely
• Leads directly to the 95% failure rate

Outcome: Pilots that don't scale, frustrated users, abandoned initiatives.

⚠️ Better Question

"Where do we waste human thinking time on work that's slow, repetitive, or queued up?"

• Identifies Version 2 opportunities (100x thinking applied)
• Finds the batch/queue sweet spots on the deployment matrix
• Focuses on proven ROI patterns

Outcome: Efficiency gains, measurable ROI, incremental transformation.

✓ Best Question

"What thinking have we never even attempted because the coordination overhead was too high?"

• Identifies Version 3 frontier (previously impossible work)
• Finds opportunities that transform capability, not just efficiency
• Points toward strategic differentiation

Outcome: New capabilities, competitive advantage, business transformation.

The wrong question leads to chatbot failures. The better question leads to efficiency gains. The best question leads to transformation. Most organisations never get past the first question—which is why most AI projects fail.

3 Questions to Ask Before Any AI Project

1. Where on the 2×2?

Map your use case: Latency tolerance × Error cost

• If left side (low latency tolerance): human-led, AI-assisted only
• If bottom-right (high latency tolerance, low error cost): prime AI territory

2. Which version of value?

• Version 1: Same work, fewer people (highest failure rate)
• Version 2: More thinking at same problems (proven ROI)
• Version 3: Previously impossible work (transformative frontier)

3. What's the on-cost reality?

Infrastructure, monitoring, governance, maintenance, change management

Can this use case bear a 2-4 year ROI timeline with 70% of investment in people and process?

Implementation Best Practices

Key success factors include starting with high-impact processes, investing in change management, ensuring data quality, and planning for continuous improvement rather than one-time implementation.

Start Right

• High-impact, low-complexity processes: Don't start with your hardest problem
• Clear ROI that can be measured: Define success metrics before you begin
• Business value focus: Not technology showcase

The Success Pattern

Tools that succeeded shared two traits: low configuration burden and immediate, visible value. In contrast, tools requiring extensive enterprise customisation often stalled at pilot stage.

• Embed in workflows, adapt to context
• Scale from narrow but high-value footholds
• Avoid requiring extensive setup before users see value

The Shift

The organisations that succeed with AI aren't the ones with the biggest technology budgets. They're the ones who changed how they think about the problem.

From Version 1 Thinking

✗ "Where can we automate?"
✗ Competing with humans
✗ Technology as cost reduction
✗ 95% failure rate

To Version 3 Thinking

✓ "What becomes rational when cognition is abundant?"
✓ New capability creation
✓ Technology as capability infrastructure
✓ Transformative potential

This shift—from AI as expensive experiment to AI as infrastructure for thinking—changes everything. It changes budget conversations. It changes risk assessment. It changes which projects get greenlit and which get killed.

Most importantly, it changes the questions you ask.

The Question to Take Away

So here's the question worth asking yourself:

"What's one project you've never attempted because the coordination overhead was too high?"

Because it would take too many people. Because it would require too much time. Because the meetings alone would sink it. Because the cognitive overhead of keeping everyone aligned would consume more energy than the work itself.

That project—the one you've never attempted—might be your Version 3.

Not because AI will do it for you. But because AI changes the coordination economics. It changes what's rational to attempt. It changes the threshold where "too expensive to think about" becomes "worth exploring."

"Once you show them that second bucket, project ideas stop being 'let's bolt AI onto X' and start becoming 'what would we do if cognition was essentially abundant?'"

That's the shift. That's the opportunity. And that's the question worth answering.

Ready to Identify Your Version 3?

The organisations transforming with AI aren't the ones with the biggest budgets. They're the ones asking better questions.

Start with the deployment matrix. Map your use cases. Identify where you're wasting thinking time. Then ask what you've never attempted—and explore whether AI changes the economics enough to make it rational.

Scott Farrell helps organisations move from Version 1 automation thinking to Version 3 capability building. Connect on LinkedIn to explore how AI changes what's possible for your organisation.

Key Takeaways

• AI on-costs are substantial: infrastructure, monitoring, governance, maintenance, and change management all add up
• Success requires 70% investment in people and process, with 2-4 year ROI timelines
• The real risks aren't hallucinations—they're systemic wrongness and silent drift
• Wrong question: "Where can we put a chatbot?" Better question: "Where do we waste thinking time?" Best question: "What thinking was never feasible before?"
• Start high-impact/low-complexity, redesign workflows, focus on business value not technology showcase
• The shift from "AI as automation" to "AI as infrastructure for thinking" changes which projects become rational to attempt

References & Sources

This ebook draws on enterprise AI research, industry surveys, academic studies, and practitioner insights compiled in late 2024 and early 2025. Where statistics or frameworks are cited, the primary source is noted. The author's interpretive frameworks integrate patterns observed across multiple engagements and sources.

Primary Research & Global Surveys

McKinsey Global Survey on AI, November 2025

State of AI adoption statistics, high-performer characteristics, workflow redesign data

https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai

MIT Media Lab: "The State of AI in Business 2025"

95% AI pilot failure rate analysis, P&L impact measurement, workflow alignment findings

https://complexdiscovery.com/why-95-of-corporate-ai-projects-fail-lessons-from-mits-2025-study/

S&P Global Market Intelligence: Enterprise AI Survey 2025

42% abandonment rate, POC-to-production gap data, regional adoption patterns

https://workos.com/blog/why-most-enterprise-ai-projects-fail-patterns-that-work

RAND Corporation: AI Project Failure Analysis

AI projects fail at 2× rate of non-AI tech, root cause analysis, misalignment patterns

https://www.rand.org/pubs/research_reports/RRA2680-1.html

Google Cloud ROI Study, September 2025

74% of executives achieved ROI within first year, adoption benchmarks

https://www.punku.ai/blog/state-of-ai-2024-enterprise-adoption

Consulting Firms & Industry Analysis

BCG: Enterprise AI Capabilities Assessment (Late 2024)

4% cutting-edge adoption, 74% yet to show tangible value despite investment

https://agility-at-scale.com/implementing/roi-of-enterprise-ai/

McKinsey: Personalisation Value Analysis

$1 trillion US market shift, 10-15% revenue lift, 40% revenue advantage for fast growers

https://www.mckinsey.com/capabilities/growth-marketing-and-sales/our-insights/the-value-of-getting-personalization-right-or-wrong-is-multiplying

Gartner: AI Software Spending Forecast

$300 billion by 2027, CFO accountability expectations

https://agility-at-scale.com/implementing/roi-of-enterprise-ai/

NTT Data: GenAI Deployment Failure Analysis

70-85% failure rate findings, retail vs custom AI tools comparison

https://www.nttdata.com/global/en/insights/focus/2024/between-70-85p-of-genai-deployment-efforts-are-failing

Technical Research & AI Systems

Anthropic: Multi-Agent Research Systems

90.2% performance improvement with orchestration, token economics (4× agents, 15× multi-agent)

https://www.anthropic.com/research/building-effective-agents

IESE: Chain-of-Thought Reasoning Breakthrough

Extended thinking, test-time compute scaling, small model outperforming 14× larger models

https://blog.iese.edu/artificial-intelligence-management/2024/chain-of-thought-reasoning-the-new-llm-breakthrough/

Hugging Face: Test-Time Compute Analysis

Inference-time reasoning, o1 architecture insights

https://huggingface.co/blog/Kseniase/testtimecompute

MIT GenAI & HCI: Brain Cache Cognitive Exoskeleton

AI as second brain, cognitive augmentation mechanisms

https://generativeaiandhci.github.io/papers/2025/genaichi2025_51.pdf

Galileo AI: Real-Time vs Batch Monitoring for LLMs

Latency-accuracy trade-offs, false positive rate analysis

https://galileo.ai/blog/llm-monitoring-real-time-batch-approaches

Zen van Riel: Real-Time vs Batch Processing Architecture

40-60% cost savings analysis, infrastructure comparison

https://zenvanriel.nl/ai-engineer-blog/should-i-use-real-time-or-batch-processing-for-ai-complete-guide/

Customer Experience & Chatbot Research

Forbes / UJET: Chatbot Frustration Survey

72% "waste of time", 78% escalate to human, 63% no resolution

https://www.forbes.com/sites/chriswestfall/2022/12/07/chatbots-and-automations-increase-customer-service-frustrations-for-consumers-at-the-holidays/

Johns Hopkins Carey: Hurdles to AI Chatbots

Gatekeeper aversion, priority queue solutions

https://carey.jhu.edu/articles/hurdles-ai-chatbots-customer-service

Nature: Consumer Trust in AI Chatbots

Service failure attribution, anthropomorphism effects

https://www.nature.com/articles/s41599-024-03879-5

WorkHub: Top 7 Reasons Chatbots Fail

Weak escalation protocols, incomplete knowledge bases

https://workhub.ai/chatbots-fail-in-customer-service/

Human Augmentation & Medical AI

EY: Human-Machine Hybrid Economy

Diagnostic sensitivity improvement (72% → 80%), AI-enhanced medical performance

https://www.ey.com/en_us/megatrends/how-emerging-technologies-are-enabling-the-human-machine-hybrid-economy

AI Magicx: Hyper-Personalisation at Scale (2025)

62% engagement increase, 80% conversion improvement

https://aimagicx.com/blog/hyper-personalization-ai-customer-experiences-2025/

e-Novia: Human Augmentation Physical AI

German Bionic Exia case study, cognitive-physical augmentation parallels

https://e-novia.it/en/news/human-augmentation-technologies-physical-ai-industry-healthcare/

Group Dynamics & Decision-Making

NYU Steinhardt: Groupthink as System

Four dimensions of defective decisions, cohesion-performance relationship

https://wp.nyu.edu/steinhardt-appsych_opus/groupthink/

ANZSOG: Effective Committee Work

Time constraints on knowledge sharing, preference negotiation patterns

https://anzsog.edu.au/app/uploads/2022/06/10.21307_eb-2018-002.pdf

ASPPA: Investment Committee Groupthink

Committee dynamics impact on portfolio performance

https://www.asppa-net.org/news/2019/5/how-investment-committees-can-avoid-groupthink/

Case Studies & ROI Examples

Klarna AI Assistant Case Study

$40M annual benefit, 700-agent equivalent workload, ticket-based (not live chat) deployment

https://www.articsledge.com/post/ai-software-business

Nasdaq Data Quality Implementation

90% reduction in time on data quality issues, $2.7M savings

https://www.montecarlodata.com/blog-ai-observability/

Enterprise AI Investment Breakdown

Average $6.4M annual spend across software, talent, infrastructure, training

https://www.secondtalent.com/resources/ai-adoption-in-enterprise-statistics/

LeverageAI / Scott Farrell

Practitioner frameworks and interpretive analysis developed through enterprise AI transformation consulting. These materials inform the conceptual frameworks presented in this ebook.

Discovery Accelerators: The Path to AGI Through Visible Reasoning Systems

Three-layer architecture (Director AI, Council of Engines, Chess-Style Reasoning), reasoning-guided search patterns

https://leverageai.com.au/wp-content/media/Discovery_Accelerators_The_Path_to_AGI_Through_Visible_Reasoning_Systems_ebook.html

Stop Replacing People, Start Multiplying Them: The AI Augmentation Playbook

Augmentation flywheel concept, week-by-week transformation patterns

https://leverageai.com.au/wp-content/media/Stop_Replacing_People_Start_Multiplying_Them_The_AI_Augmentation_Playbook_ebook.html

The Team of One: Why AI Enables Individuals to Outpace Organisations

Multi-agent performance data, marketplace-of-one economics, solopreneur capability analysis

https://leverageai.com.au/wp-content/media/The_Team_of_One_Why_AI_Enables_Individuals_to_Outpace_Organizations_ebook.html

Stop Automating. Start Replacing.

Cost structure flip concept, per-customer economics analysis

https://leverageai.com.au/wp-content/media/Stop_Automating_Start_Replacing_ebook.html

The Agent Token Manifesto

Hypersprint concept, overnight iteration patterns, agent economics

https://leverageai.com.au/wp-content/media/The_Agent_Token_Manifesto.html

The AI Think Tank Revolution

Multi-agent reasoning systems, specialised AI council patterns

https://leverageai.com.au/wp-content/media/The_AI_Think_Tank_Revolution_ebook.html

Note on Research Methodology

This ebook synthesises research from multiple source categories: global enterprise surveys (McKinsey, BCG, S&P Global), academic research (MIT, NYU, Johns Hopkins), industry analysis (Gartner, NTT Data), and practitioner insights. Statistics and quotations are attributed to their primary sources throughout the text.

The author's frameworks—including the "Maximising AI Cognition and AI Value Creation," the 2×2 deployment matrix, "hyper sprints," "marketplace of one," and "cognitive exoskeleton" concepts—represent interpretive synthesis developed through enterprise AI consulting engagements. These are presented as the author's analytical lens rather than as external research findings.

Research compiled: November–December 2025
Note: Some linked resources may require subscription access. URLs were verified at time of publication but may change.