Maximising AI Cognition and AI Value Creation
Why Most Projects Fail and How to Find the Frontier
A framework for enterprise AI deployment that plays to the technology's strengths
By Scott Farrell
What You'll Learn
- ✓ Why 70-85% of AI projects fail—and the pattern behind the 15% that succeed
- ✓ The 2×2 framework that predicts which deployments will work
- ✓ Three versions of AI value—and why Version 3 is the untapped frontier
- ✓ Hyper sprints, marketplace of one, and thinking that was never feasible before
The Uncomfortable Truth About AI Failure
In the boardroom of a mid-sized insurance company, the mood was triumphant. After nine months of development and a $2.3 million investment, their AI-powered customer service chatbot was ready to launch. The CEO had promised the board something impressive. Competitors were making AI announcements. The pressure was on.
Six weeks later, the same boardroom witnessed a very different conversation. Customer complaints had spiked. Net Promoter Score had dropped. Support staff were fielding escalations from customers who couldn't get simple questions answered. The chatbot was quietly pulled offline. The project became a cautionary tale whispered in other boardrooms: "Remember when they tried AI?"
This scenario isn't fictional. It's the pattern playing out across enterprises worldwide. And the scale of the problem should alarm every executive allocating capital to AI.
The Statistics That Should Alarm You
Let's start with the uncomfortable numbers:
The AI Failure Landscape
70-85% of AI projects fail to meet expected outcomes
95% of corporate AI pilots show zero return on investment
42% of companies abandoned most AI initiatives in 2025 (up from just 17% in 2024)
46% average percentage of AI proof-of-concepts scrapped before production
2x AI projects fail at twice the rate of traditional technology projects
Read those numbers again. This isn't a rounding error. This isn't a temporary growing pain. AI failure is officially the norm.
"Despite $30–40 billion in enterprise investment in generative artificial intelligence, AI pilot failure is officially the norm — 95% of corporate AI initiatives show zero return."— MIT Media Lab, "The State of AI in Business 2025"
The POC-to-Production Chasm
The gap between proof-of-concept and production deployment has become a graveyard for AI ambitions. The data reveals a troubling pattern:
- • Only 26% of organisations have the capabilities to move beyond POC to production
- • Only 6% qualify as "AI high performers" (achieving 5%+ EBIT impact)
- • 74% of companies have yet to show tangible value despite widespread investment
Think about what this means: three-quarters of enterprises investing in AI are seeing no meaningful return. They're running pilots, attending conferences, hiring consultants, buying tools—and getting nothing.
— PMI (Project Management Institute) research on AI deploymentThe Root Cause: Forcing AI Into Unchanged Processes
When MIT's Media Lab systematically reviewed over 300 publicly disclosed AI initiatives, a pattern emerged. The research, led by Aditya Challapally, identified the core failure mechanism:
"Most AI efforts falter due to a lack of alignment between technology and business workflows. Companies have attempted to force generative AI into existing processes with minimal adaptation."— MIT Media Lab, 2025
RAND Corporation's analysis reinforces this finding: misunderstandings and miscommunications about the intent and purpose of AI projects are the most common reasons for failure. Organizations launch AI initiatives without genuine clarity about what problem they're solving or why AI is the right tool.
— RAND Corporation, "Root Causes of AI Project Failure"The List of Failures
Beyond the alignment problem, enterprise AI projects stumble over a predictable set of obstacles:
Common AI Project Failure Points
Data & Governance
- • Poor data hygiene and quality
- • Lack of governance frameworks
- • Data privacy and security risks
Infrastructure & Operations
- • Inappropriate internal infrastructure
- • Lack of proper AI operations capability
- • Cost overruns that impact profitability
Strategy & Execution
- • Failure to choose the right proof of concept
- • Treating AI deployment as SaaS procurement
- • Misalignment between tech and workflows
Organizational Readiness
- • Off-the-shelf tools have lower adoption than custom
- • Lack of skilled internal teams
- • Resistance to workflow redesign
Notice what's missing from this list: the AI itself. The technology isn't failing. The deployments are.
The Puzzling Contrast
Here's where the narrative gets interesting. While the majority fail spectacularly, a minority succeed spectacularly. The contrast is stark:
The AI Success Stories
$3.70 return per dollar invested for early adopters
Companies that moved into generative AI adoption early report consistent positive ROI
$10.30 return per dollar invested for top performers
High performers achieve more than 10x return on AI investments
74% achieved ROI within the first year
Google Cloud study (September 2025) shows rapid value realization is possible
$40M annual benefit from single AI assistant
Klarna's AI assistant replaced work of 700 employees, contributing estimated $40M profit improvement in 2024
Let's pause on that Klarna example. A single AI system delivering a net benefit of $35-38 million annually (after technology costs of $2-5 million). That's not incremental improvement. That's transformation.
So we face a paradox:
- → 85% of organizations see AI projects fail
- → 15% achieve returns of 3x to 10x on their investment
The difference isn't the technology. Everyone has access to the same models, the same tools, the same cloud platforms. The difference is where and how they deploy.
The Question This Book Answers
What separates the 15% that succeed from the 85% that fail?
The answer isn't:
- ✗ Better AI models (everyone has access to GPT-5, Claude, Gemini)
- ✗ Bigger budgets (failures waste millions; successes often start small)
- ✗ Better vendors (same platforms are used by winners and losers)
- ✗ More AI expertise (PhD-heavy teams fail as often as pragmatic ones)
The answer is:
Successful organizations deploy AI where it has asymmetric advantage.
They don't try to make AI work everywhere. They identify the specific contexts where AI's strengths dramatically outweigh its weaknesses—and they avoid the contexts where the opposite is true.
This book gives you the framework to make that distinction. You'll learn:
By the end of this book, you won't just understand why 85% of AI projects fail. You'll have a systematic way to ensure yours doesn't join them.
Key Takeaways
- • AI failure isn't the exception—it's the norm. 70-85% of projects fail, with 95% showing zero ROI. This isn't a temporary problem; it's a systematic deployment error.
- • The pattern: forcing AI into unchanged processes. Organizations try to bolt AI onto existing workflows with minimal adaptation, creating misalignment between technology capabilities and business reality.
- • Some companies achieve 10x returns. The contrast between 85% failure and 15% spectacular success reveals that deployment decisions, not technology choices, determine outcomes.
- • The answer isn't better AI; it's better deployment decisions. Success comes from identifying contexts where AI has asymmetric advantage and avoiding contexts where it doesn't.
The uncomfortable truth is that most organizations are approaching AI backwards. They're asking "Where can we use AI?" when they should be asking "Where does AI create asymmetric value?"
The next chapter shows you the economic logic behind that reframe.
The Real Carrot: Cost of Cognition
Strip away the hype. What do companies actually want from AI? "Getting on the bandwagon" isn't a business case you can take to a CFO. But underneath the noise, there's a genuine economic story—one that changes how we think about strategic capability.
If you ask a CEO why they're investing in AI, they'll rarely say the quiet part out loud. But when pressed—when you get past the "innovation agenda" and the "staying competitive" rhetoric—the answer almost always comes down to economics.
And the economics aren't about robots. They're about thinking.
From "AI" to "Cheap Cognition"
Here's the reframe that cuts through: firms buy AI because they believe thinking per hour is going to be way cheaper than human thinking per hour.
Let's give it a name: cost per unit of useful cognition.
For humans, that's not just salary. It's salary plus all the on-costs: management overhead, desk space, tooling, training, coordination overhead, and—let's be honest—meetings to decide what the meetings meant.
For AI, you've got a different stack of costs:
- •Model costs: per token, per call, licensing fees
- •Platform and orchestration: the infrastructure to make AI callable
- •Integration: hooking AI into existing systems and data
- •Governance and compliance: policy frameworks, audit trails, risk controls
- •Monitoring and observability: dashboards, logs, anomaly detection
- •Incident response: handling errors, rollbacks, escalation paths
So no, AI isn't "$1 per hour" when you account for everything. But the proposition still holds: once the plumbing is in place, the marginal cost of an additional thinking task trends toward cents, not dollars.
"The carrot isn't AI. The carrot is: We can throw 100x more thinking at our problems than we used to, for roughly the same spend."
That's a very different pitch than "let's automate some jobs."
The Investment Reality: What Companies Are Actually Spending
If this sounds theoretical, the numbers make it concrete. Global AI spending hit $154 billion in 2024, and it's forecast to reach $300 billion by 2027.
For the average mid-to-large enterprise, annual AI spend now sits around $6.4 million, broken down like this:
Average Enterprise AI Investment Breakdown (Annual)
AI Software & Platforms
$2.4M+47% year-over-year growth
AI Talent & Consulting
$1.8M+52% year-over-year growth
Infrastructure & Compute
$1.2M+34% year-over-year growth
Training & Development
$650K+61% year-over-year growth
Notice what's happening: the fastest-growing line item is training and development (up 61% year-over-year), followed by talent and consulting (up 52%). Companies are learning that the technology is the easy part. Building the capability—the people, processes, and judgment to deploy AI well—is where the real investment goes.
Per-Inference Economics: The Hidden Burn Rate
Let's zoom into the unit economics. The price collapse has been staggering: GPT-4 launched at $60 per million output tokens in 2023. By late 2025, GPT-4.1 nano costs just $0.40 per million—a 99% drop in under three years. Even GPT-5, with vastly superior capabilities, costs less than GPT-4 did at launch. Sounds like the problem is solved, right?
But scale changes everything. One fintech reported their enterprise chatbot was burning $400 per day per client. For AI companies like OpenAI, infrastructure costs represent roughly 75% of revenue. That's not a sustainable margin—it's a subsidy hoping for volume.
The Marginal Cost Promise
Here's where the economics get interesting. Once you've built the plumbing—the integration, governance, monitoring, and escalation paths—the cost of running one more query or handling one more task drops toward near-zero.
Compare that to humans:
Marginal Cost Comparison
Human Worker
To handle 10% more tasks, you typically need to:
- • Hire another person (months of lead time)
- • Pay full salary + on-costs
- • Onboard, train, and ramp up (3–6 months)
- • Accept coordination overhead increases non-linearly
Marginal cost: Nearly the same as average cost. Scaling is expensive and slow.
AI System
To handle 10% more tasks, you typically:
- • Run the same infrastructure (already built)
- • Pay a few extra cents in token costs
- • Scale instantly (no hiring, no onboarding)
- • Coordination overhead stays flat or decreases
Marginal cost: Pennies. Scaling is near-instant and near-free once the platform exists.
This is the promise executives are buying: elastic cognitive capacity. Apply 10x, 100x, even 1,000x more thinking to a problem without hiring a village.
But—and this is critical—that only holds if you've built the right plumbing. If every AI call requires manual review, custom integration, or one-off fixes, you've just recreated the human scaling problem with different tooling.
How to Talk to Executives About This
Language matters. A lot.
If you talk about "robots," executives hear science fiction. If you talk about "automation," they hear job cuts—and then HR and middle management start resisting.
But if you talk about cheap cognition, you're talking about capability expansion. You're shifting the conversation from cost centre to force multiplier.
The Pitch That Lands
"We can apply 100x more thinking to our strategic problems than we could before. The cost structure for analysis, decision support, and customer insight has fundamentally changed."
"Cognition used to be our constraint. Now it's abundant. The question is: where do we deploy it for asymmetric advantage?"
Notice what that does: it reframes AI from a replacement story to an amplification story. You're not cutting heads. You're expanding what's possible.
What Executives Actually Care About
Research shows that when enterprises evaluate AI investments, they prioritise:
- 1.Measurable value delivery (30%): Can you show ROI in business terms?
- 2.Industry-specific customisation (26%): Does it fit our context?
- 3.Price considerations (1%): Cost matters, but it's tertiary
Translation: executives don't care about tokens, parameters, or whether you're using GPT-5 versus Claude. They care about outcomes they can defend in a board meeting.
So frame your AI proposals in their language:
See the pattern? You're translating from technology capability to business impact. That's the conversation that unlocks budget.
The Reality Check: It's Not Actually "$1 Per Hour"
Let's be honest: the "$1 per hour AI" line is marketing. When you add up the on-costs—governance, monitoring, integration, incident response, model maintenance—AI isn't free.
But here's the crucial comparison: you need to benchmark AI's all-in cost against the all-in cost of human cognition. Not salary versus tokens. Total cost of ownership versus total cost of ownership.
When you do that math properly—and we'll detail the AI on-costs in Chapter 9—the economic case still holds for the right deployments. The key phrase: for the right deployments.
When the Economics Work
The cognition cost advantage isn't universal. It kicks in when you have:
High Volume
Enough cognitive tasks that the upfront investment in plumbing pays off quickly. If you're only running 100 queries a month, stick with humans.
Parallelisable Work
Tasks that can run simultaneously without coordination overhead. One human handles one thing at a time; one AI platform can handle thousands in parallel.
Time Flexibility
Work that doesn't require instant responses. Batch processing, overnight analysis, ticket queues—anywhere you can trade latency for accuracy and thoroughness.
If you're missing any of those three, the economics get murky fast. And if you need real-time, high-stakes, one-shot-correct answers—the kind where a customer is waiting on the other end—AI might actually be more expensive than a human when you account for all the guardrails and verification layers you need.
That's not a technology limitation. It's an economic reality. And understanding where AI's cost advantage kicks in versus where it evaporates is the difference between a project that compounds value and one that burns budget.
Which brings us to the next chapter: the asymmetry nobody talks about.
Chapter Takeaways
- • The AI business case is about cost per unit of cognition, not "AI" as a buzzword
- • Human costs include massive on-costs: management, facilities, coordination, training—not just salary
- • AI costs include infrastructure, governance, monitoring, and ops—not just tokens
- • The promise: marginal cost of additional thinking trends toward near-zero once plumbing is built
- • Talk to executives about capability amplification, not job replacement—it unlocks budget and reduces political resistance
- • The economics work best for: high volume + parallelisable tasks + time flexibility
The Asymmetry Nobody Talks About
Two support teams at different companies receive the same customer question at 9am on a Monday.
Tale of Two Deployments
⚡ Scenario A: Live Chat
- • Customer frustrated, expects response in 30 seconds
- • AI chatbot has one shot to be right
- • Wrong answer → escalation → bad NPS
- • Screenshot-ready failure for social media
Success criteria: Fast + accurate + one chance
🎯 Scenario B: Ticket Queue
- • Response expected within 4 hours
- • AI agent can check history, cross-reference systems
- • Can escalate if uncertain
- • Customer already primed for measured response
Success criteria: Thorough + verifiable + time flexibility
The paradox: Same AI technology. Same customer question. Completely different success rates.
This asymmetry is why 72% of customers say chatbots are a complete waste of time—yet the same AI technology deployed in ticket systems delivers 40-60% cost savings with measurably better accuracy.
The Latency-Accuracy Trade-Off
In a live chat session, your AI effectively has one shot to be right. There's a real customer on the other end. They're frustrated. They've already waited through the automated phone tree. They're one bad experience away from tweeting about your brand.
No opportunity to be correct on the third attempt. No room for iterative refinement. No forgiveness for "getting there eventually."
What Real-Time Reliability Actually Costs
To get that one-shot accuracy, you need to throw everything at the problem:
- • A stronger model (more expensive per call, and increased latency)
- • Thinking models for complex reasoning (adds significant latency—seconds to minutes—trading speed for accuracy)
- • Rich context from multiple systems—retrieval, CRM, policies, knowledge base
- • Guardrails—security filters, PII checks, brand tone alignment
- • Maybe a second-pass checker to catch hallucinations and validate responses
All of that adds latency and cost per interaction. Every safeguard you add makes the response slower. Every verification step increases the risk that your customer is staring at a "typing..." indicator for 8, 10, 15 seconds.
"Giving a quick and accurate answer is not something AI is equipped to deal with at the moment. You can't get it right on the third go around, because there's a real customer on the other end."
The Research Confirms It
Real-time AI monitoring consistently shows higher false positive rates compared to batch processing. Why? Limited context and the pressure to make quick decisions force the AI to operate with incomplete information.
— Galileo AI, "Real-Time vs Batch Monitoring for LLMs"The same pattern holds across domains: when AI systems need to respond instantly, they trade accuracy for speed. More complex models provide higher accuracy but require more processing time. This fundamental trade-off shapes success or failure in deployment.
— BugFree AI, "Latency vs Accuracy Tradeoffs in Real-Time Systems"Why Humans Still Win (For Now)
Here's the uncomfortable truth: in low-latency, high-stakes, ambiguous situations, a trained human support agent often still outperforms AI.
Even if that human takes 30 seconds or a minute to respond, they're bringing advantages that current AI can't replicate under time pressure:
Tacit Knowledge
Years of experience with internal tools. Pattern-matching that happens intuitively. "I've seen this before" recognition.
Real-Time Adjustment
Can read customer tone and pivot mid-conversation. "No, that's not what I meant" gets handled gracefully.
Judgment Calls
Knows when to break policy to keep a customer. Can spot when "standard answer" will make things worse.
Recovery Mode
"Let me try that again" works. Customers forgive humans being slightly wrong if correction is quick and genuine.
The human doesn't need perfect accuracy on the first try. They can course-correct. They can say "I'm checking with our billing team" and buy time without losing trust. They bring social intelligence to delicate situations.
AI in live chat doesn't get these affordances. It needs to be right, safe, on-brand, and fast—all at once.
The Chatbot Disaster Data
If this feels like we're being too harsh on chatbots, the customer satisfaction data is even harsher.
The Numbers Don't Lie
72%
Say chatbot interaction is "complete waste of time"
— UJET survey, Forbes 202278%
Forced to connect with human after chatbot failure
— UJET survey, Forbes 202263%
Chatbot interaction did not result in resolution
— UJET survey, Forbes 202280%
Chatbots increased their frustration level
— UJET survey, Forbes 2022The kicker: 64% would prefer companies not use AI for service at all. 53% would switch to a competitor offering human support.
— LinkedIn analysis of UK customer satisfaction surveys, 2024The Attribution Problem
When customers experience chatbot failures, they don't blame "this specific instance"—they blame AI capabilities as a category. Because AI capabilities are seen as relatively constant and not easily changed, customers assume similar problems will keep recurring.
This creates a trust death spiral: one bad experience poisons the well for future interactions. Unlike human service failures (which customers attribute to "that specific agent having a bad day"), AI failures feel systemic and unfixable.
— Nature, "Consumer Trust in AI Chatbots: Service Failure Attribution"The Flip: When AI Has Time
Now reverse the scenario. What happens when you remove the time pressure?
When AI has latency tolerance—minutes, hours, or overnight batch processing—the entire game changes.
What Changes With Time Flexibility
Read Full History
AI can parse every past interaction, invoice, support ticket, and account note—no skimming required.
Cross-Check Systems
Check CRM, billing, product database, knowledge base, policy docs—sequentially or in parallel, no rush.
Multi-Step Reasoning
Chain-of-thought reasoning, verify assumptions, explore edge cases. Try multiple approaches and select the best response.
Escalate When Uncertain
Instead of guessing, flag complex cases for human review. No customer is waiting, so escalation doesn't feel like failure.
The customer already expects a measured response—minutes or hours, not seconds. They submitted a ticket, not initiated a chat. Their mental model already accommodates asynchronous communication.
In this context, AI can really put thought into each response. It's not fighting the clock. It's playing to its strengths.
The Batch Processing Economics
The cost structure completely inverts when you move from real-time to batch processing.
Infrastructure Cost Comparison
Batch processing typically delivers 40-60% infrastructure cost savings compared to real-time systems, with savings increasing at higher volumes.
Example: 1 million requests daily
Real-time system: 100 GPU instances running 24/7
Batch system: 20 GPUs running during off-peak hours
Same volume. Same work. 80% fewer resources.
Beyond cost, batch monitoring provides more accurate analysis than real-time. With a broader view across datasets and time to verify patterns, false positive rates drop significantly.
— Galileo AI, "Real-Time vs Batch Monitoring for LLMs"The Fundamental Asymmetry
We're describing two fundamentally different games with the same technology.
Game 1: Live Interaction
- Requirement: One-shot accuracy under time pressure
- Constraint: Must respond in seconds
- Cost: Expensive safeguards, higher model, still often fails
- Customer expectation: Immediate, perfect
- Error tolerance: Near zero
⚠️ AI's weak zone
Game 2: Batch/Queue
- Requirement: Iterative accuracy with verification
- Constraint: Can respond in minutes/hours
- Cost: 40-60% cheaper, higher accuracy
- Customer expectation: Thoughtful, thorough
- Error tolerance: Can escalate if uncertain
✓ AI's strong zone
Same AI capabilities. Completely different outcomes.
The asymmetry is this: AI is terrible at fast + accurate + one-shot. But it's brilliant at batch processing with time flexibility.
Most companies deploy AI exactly where it's weakest—live customer interactions with no room for error—and then wonder why it fails.
"If you're trying to replace a human in a live, high-pressure interaction, you're fighting the technology. If you're turning big slow queues into fast parallel flows, you're playing to its strengths."
Chapter Takeaways
- • AI faces a fundamental latency-accuracy trade-off that can't be engineered away
- • Live interactions require expensive safeguards that still often fail—72% of customers say chatbots waste their time
- • Humans still win in low-latency, high-ambiguity, high-stakes contexts through tacit knowledge and recovery
- • Batch processing delivers 40-60% cost savings and higher accuracy vs real-time systems
- • Hallucinations aren't the real risk—systemic wrongness at scale and silent drift are
- • Deploy in batch/queue contexts where time flexibility exists, not in live-chat contexts where humans excel
The 2x2 That Predicts Success
You've heard the pitch: AI will transform customer service, accelerate decision-making, automate operations. Then you deploy a chatbot that frustrates customers. Launch a real-time analytics system that costs three times more than projected. Build an autonomous approval engine that your compliance team refuses to certify.
The problem isn't the technology. It's that we lack a simple framework for predicting which AI projects will succeed versus which will fail. Let me give you one you can hold in your head.
TL;DR
- • Map AI projects on two axes: latency tolerance (can we wait?) × error cost (what happens if wrong?)
- • Bottom-right quadrant = prime AI territory: high latency tolerance, low error cost, 40-60% cost savings
- • Left side = human-led territory: low latency tolerance requires instant responses AI can't reliably deliver
- • Most chatbot failures are left-side problems deployed as right-side solutions
- • Move the conversation from "we need a chatbot" to "we need an AI queue brain"
The Two Dimensions That Matter
Every AI deployment decision can be mapped on two critical dimensions. Understanding where your use case sits on this framework will predict success or failure with remarkable accuracy.
X-Axis: Latency Tolerance
Can this work wait? That's the first question. Not "should it wait" but "can it wait without breaking the business process or disappointing the customer?"
Left Side: Must Respond in Seconds
Examples: Live chat support, synchronous API calls, real-time fraud detection, on-the-fly pricing, delicate negotiations, high-stakes emergency decisions
Constraint: Human is waiting. Every second of latency is visible and costly.
AI Challenge: Accuracy requires context, verification, and reasoning time. Speed demands shortcuts.
Right Side: Can Respond in Minutes / Hours / Overnight
Examples: Ticket queues, batch analytics, overnight reports, document processing, CRM scoring, research tasks, data reconciliation
Advantage: No one is waiting. AI can take time to verify, iterate, and improve quality.
AI Sweet Spot: Time flexibility allows extended thinking, multi-pass verification, and higher accuracy.
Y-Axis: Consequence of Error
What happens if the AI gets it wrong? This determines how much verification, oversight, and human review you need in the loop.
Bottom: Cheap to Fix
Examples: Internal notes wrong, trivial customer inconvenience, easy re-run, draft documents, research summaries
Recovery: Human can spot-check, customer can ask for clarification, system can re-process easily
Design Implication: High autonomy acceptable. Focus on throughput over perfection.
Top: Expensive / Regulated / Safety-Critical
Examples: Compliance violations, customer churn risk, legal exposure, financial loss, safety incidents, regulatory breaches
Risk: Error costs exceed benefit. Single failure damages trust or triggers audit.
Design Implication: Requires human-in-the-loop, approval gates, audit trails, and graceful escalation.
The AI Deployment Framework
🚨 Danger Zone
High-stakes + Real-time
Human-led ONLY
• Live emergency response
• Real-time compliance
• Critical negotiations
AI: Context surfacing only
⚖️ Copilot + Gate
High-stakes + Time flexibility
AI drafts, human approves
• Credit policy changes
• Complex pricing
• Regulatory reports
AI: Deep analysis + recommendation
⚡ Speed Over Perfection
Low-stakes + Real-time
AI-assisted, human-led
• Live chat (low-value)
• Simple FAQs
• Instant categorization
AI: Surface context, human responds
🎯 Prime AI Territory
Low-stakes + Time flexibility
Full AI autonomy
• Ticket triage & resolution
• Overnight batch analytics
• Report generation
• Document processing
40-60% cost savings here
The Four Quadrants Explained
Bottom-Right: Prime AI Territory
This is where AI dominates. High latency tolerance gives AI time to think, verify, and iterate. Low error cost means mistakes are cheap to catch and fix. The economics are compelling.
Sweet spots in this quadrant:
- Ticket queues: Triage incoming requests, resolve simple cases autonomously, draft detailed responses for complex cases, surface relevant context for human agents
- Overnight batch jobs: Transaction analysis, CRM lead scoring, anomaly detection, data quality checks, reconciliation tasks
- Report generation: Research synthesis, competitive intelligence, quality assurance at scale, documentation updates
- Internal operations: Document classification, data extraction, workflow orchestration, process monitoring
"Batch processing typically reduces infrastructure costs by 40-60% compared to real-time systems, with savings increasing at higher volumes. A real-time system processing 1 million requests daily might require 100 GPU instances running 24/7, while a batch system could process the same volume with 20 GPUs running during off-peak hours."— Zen van Riel, "Should I Use Real-Time or Batch Processing for AI?"
Top-Right: AI + Human Sign-off
When error cost is high but you have time flexibility, deploy the "copilot + gate" pattern. AI does the heavy analytical work—research, modeling, draft recommendations—and a human reviews and approves before execution.
Copilot + Gate: Use Cases
AI's Role
- • Analyze historical patterns and scenarios
- • Generate policy recommendations with supporting evidence
- • Model potential outcomes and risks
- • Draft comprehensive reports with citations
- • Surface edge cases and compliance considerations
Human's Role
- • Review AI's reasoning and verify accuracy
- • Apply judgment to political/cultural factors
- • Make final go/no-go decision
- • Take accountability for the outcome
- • Override when context demands different approach
Example scenarios: Credit policy changes, complex B2B pricing decisions, regulatory report preparation, strategic vendor recommendations, M&A due diligence analysis
The key: AI multiplies the human's analytical capacity by 10-100x, but the human retains decision authority and accountability. You get better decisions faster, without sacrificing oversight.
Left Side: Human-Led Territory
When latency tolerance is low—when someone is waiting for an immediate response—human expertise still wins. This is true regardless of whether error cost is high or low.
Why humans still lead in real-time contexts:
- Accuracy requires time. AI needs context, verification, and reasoning cycles to deliver high-quality responses. Real-time pressure forces shortcuts that degrade accuracy.
- Humans read social cues. In live interactions—especially delicate or emotional ones—humans pick up tone, frustration, urgency signals that text-based AI misses.
- Escalation is inevitable. When AI hits its limits in a live context, the handoff to a human feels like failure. Better to start with human-led and use AI as augmentation.
"Over 72% of respondents in a recent UJET survey reported that interaction with a chatbot is a 'complete waste of time.' 78% of consumers have interacted with a chatbot in the past 12 months—but 80% said using chatbots increased their frustration level."— Forbes, "Chatbots and Automations Increase Customer Service Frustrations"
The Framework in Practice
Mapping Your Use Cases
Here's how to use this framework immediately:
- List your current or planned AI projects. Be specific: "AI chatbot for customer support" or "Automated credit approval" or "Overnight analytics reports."
-
Ask the two questions for each project:
- Can this work wait minutes/hours, or must it respond in seconds?
- If AI gets this wrong, is it cheap to fix or expensive/regulated/safety-critical?
- Place each project on the 2x2 grid. Be honest about where it actually sits, not where you wish it would sit.
- Projects in bottom-right quadrant: Green light. These are likely to succeed and deliver strong ROI. Focus your investment here.
- Projects on the left side: Redesign as human-led with AI augmentation. Don't deploy as autonomous systems.
- Projects in top-right: Good candidates, but require approval gates and audit trails. Budget for human review overhead.
Common Misallocations
The most common AI project failures happen when organizations misread where their use case sits on the grid:
The Chatbot Mistake
❌ What Organizations Think
- • "Live chat is just text-based, so AI can handle it"
- • "Customers will tolerate some errors if responses are instant"
- • "We'll save money by automating our support team"
Result: 78% of customers escalate to humans anyway. Frustration increases. AI gets blamed.
✓ What Actually Works
- • "Live chat is left-side: low latency tolerance"
- • "AI surfaces context; human owns the conversation"
- • "Ticket queues are right-side: AI can resolve 40-60% autonomously"
Result: Ticket resolution faster and cheaper. Live chat becomes more effective. Customers happier.
Real-Time Fraud Detection Without Escalation
Misallocation: Deployed as autonomous system in top-left quadrant (high error cost + low latency tolerance).
Problem: False positives block legitimate transactions. False negatives allow fraud through. No time for verification.
Fix: AI flags suspicious activity with confidence scores. High-confidence blocks proceed autonomously. Medium-confidence triggers human review. Low-confidence transactions pass with monitoring.
Pro Tip: Run a second batch job when latency isn't critical—with more detailed analysis, a smarter model, extended thinking, and comparison against broader data patterns and other customer interactions. This second pass catches what the real-time system missed, flagging additional transactions for review. Even tickets routed to human-in-the-loop should arrive with AI-generated augmentation: context summaries, similar past cases, confidence breakdowns, and recommended actions.
Autonomous Customer Service with No Fallback
Misallocation: Chatbot designed to handle all inquiries without clear escalation path.
Problem: AI loops on canned responses when it can't understand. Customer gets frustrated. Eventually finds phone number or gives up.
Fix: Recognize you're on the left side. Use AI for triage and simple cases only. Route complex cases to ticket queue for async resolution. Provide instant "talk to human" button.
Batch Processing: The Economics
The bottom-right quadrant's economic advantage isn't obvious until you understand the infrastructure math behind batch versus real-time processing.
Why Batch Wins
Batch processing delivers 40-60% cost savings because of four structural advantages:
| Dimension | Real-Time | Batch |
|---|---|---|
| Architecture | Complex: load balancers, autoscaling, hot standby, real-time data pipelines | Simple: scheduled jobs, sequential processing, standard ETL |
| Resource Provisioning | Sized for peak load 24/7, even if peak is 2 hours/day | Sized for total volume, run during off-peak pricing windows |
| Verification | Limited—speed is priority | Extensive—can iterate and verify before delivery |
| Operational Overhead | High: monitoring, alerting, incident response 24/7 | Low: failures can wait until morning, easier debugging |
"Batch = Simpler, slower, cheaper. Streaming = Faster, more complex, higher ops overhead. Most mature systems combine both—batch for deep, historical analysis; streaming for instant reactions. Pick the right model based on latency tolerance, data volume, and complexity."— Nikki Siapno, "Batch Processing vs Real-time Streaming"
The Infrastructure Math
Selective Real-Time Pattern
You don't have to choose between batch and real-time for your entire system. The smartest architectures use both:
- Apply real-time processing only to high-value or time-sensitive operations where the benefit of instant response justifies the cost
- Route everything else to batch queues for overnight or scheduled processing
- Use AI agents to decide routing based on request characteristics: urgency signals, customer tier, complexity assessment
This "selective real-time" pattern gets you the best of both worlds. Critical work happens instantly. Everything else benefits from batch economics and higher quality.
Moving the Conversation
The framework changes how you talk about AI projects. It shifts the conversation from technology ("we need a chatbot") to business context ("we need to process high-volume cognitive work with time flexibility").
From Chatbot to Queue Brain
Reframing the AI Discussion
❌ Stop Saying:
- • "We need an AI chatbot for customer support"
- • "Let's automate our live agents with AI"
- • "AI will make our real-time systems faster and cheaper"
✓ Start Saying:
- • "We need an AI queue brain for ticket resolution and an AI context engine for live agents"
- • "Let's use AI to multiply agent effectiveness by 2-3x while keeping humans in the driver's seat"
- • "AI excels in batch/queue contexts—let's find our highest-volume async workflows"
Impact: This reframing immediately filters out low-probability-of-success projects and focuses investment on the bottom-right quadrant where ROI is proven.
The Strategic Questions
When evaluating AI opportunities, ask these two questions first:
-
"Where do we have high-volume cognitive work with time flexibility?"
Look for: ticket backlogs, overnight batch jobs, report generation, document processing, research tasks, data reconciliation, quality checks, triage workflows.
-
"Where are errors cheap to fix or catch before they cause harm?"
Look for: internal-only processes, draft outputs that humans review, operations with undo capability, contexts where verification is fast/easy.
The intersection of these two questions is your AI project pipeline. Prioritize based on volume, current cost, and strategic importance.
Chapter Takeaways
Map AI projects on two axes: Latency tolerance (can this work wait?) × Error cost (what happens if AI is wrong?). This 2x2 predicts success better than any technology assessment.
Bottom-right quadrant = prime AI territory: High latency tolerance + low error cost delivers 40-60% cost savings, simpler architecture, and higher quality outputs.
Left side = human-led territory: When latency tolerance is low, humans still win. Deploy AI as augmentation (context surfacing, draft responses) not replacement.
Chatbot failures are allocation errors: 78% of customers escalate to humans because live chat sits on the left side (low latency tolerance) but is deployed as if it's on the right side.
Batch economics beat real-time: Same volume, 70% lower cost. Batch systems provision for total volume during off-peak hours. Real-time systems provision for peak load 24/7.
Selective real-time is the mature pattern: Route high-value/time-sensitive work to real-time processing. Route everything else to batch queues. Use AI to decide routing.
Reframe the conversation: Stop saying "we need a chatbot." Start saying "we need an AI queue brain for async work and AI augmentation for live agents."
Ask the two strategic questions: (1) Where do we have high-volume cognitive work with time flexibility? (2) Where are errors cheap to catch? The intersection is your AI pipeline.
What's Next
The 2x2 framework tells you where to deploy AI. But it doesn't tell you what becomes possible when cognition becomes abundant and cheap. That's what Version 3 unlocks: work that was previously too expensive to even attempt.
In the next chapter, we'll explore the three versions of AI value—and why most organizations are still stuck on Version 1 while the real transformation happens at Version 3.
The Three Versions of AI Value
Most AI conversations stop at "automation"—replace human tasks, save money, move on. But that's only the first version, and ironically, it has the highest failure rate. To understand why some companies achieve 10x returns while 85% fail, you need to see the full progression: from cost reduction to capability amplification to entirely new frontiers that were never rational to attempt before.
TL;DR: The Three Versions
- • Version 1 (Automation): Same work, fewer people—highest failure rate, competing with humans at what they do well
- • Version 2 (Scale): 10-100x more thinking at same problems—$3.70-$10.30 ROI per dollar, check everything instead of sampling
- • Version 3 (Frontier): Previously impossible thinking—work that was never rational because coordination overhead killed it
Version 1: Same Work, Fewer People
This is the classic automation play. Replace a human task with AI. Invoice processing, email triage, basic data entry—the stuff that shows up first in consultant decks and vendor demos. It's the most common deployment pattern, and it's also where most of the 70-85% failure rate lives.
Version 1: The Automation Play
What it is: Replace a human doing task X with AI doing task X
Value promise: Cost reduction through headcount savings
Why it often fails: You're competing with humans at what humans do reasonably well, in contexts optimised for human cognition over decades
When it works: Only in the right quadrant—high latency tolerance, low error cost, volume high enough to justify infrastructure
Why does this fail so often? Because you're asking AI to beat humans in environments humans designed for themselves. The workflows were built around human strengths. The tools reflect human mental models. The edge cases got handled through years of accumulated judgment.
When you drop AI into these contexts—especially live, high-stakes interactions—you're fighting the technology. And as we saw in Chapter 3, AI loses that fight 72% of the time in customer service scenarios.
When Version 1 Actually Works: Klarna's $40M Win
But Version 1 isn't always doomed. When deployed in the right context—remember the 2x2 from Chapter 4—it can deliver extraordinary results.
Case Study: Klarna's AI Assistant
The Setup
OpenAI-powered AI assistant handling customer service interactions
The Scale
Replaced work of 700 customer service agents
The Math
- • Direct salary savings: $28M (700 × $40K)
- • Operational costs: $12M+ (training, benefits, facilities)
- • Technology costs: -$2-5M
- • Net benefit: $35-38M annually
The Key Insight
This isn't live real-time support with one-shot accuracy requirements. It's ticket handling—batch-adjacent work where AI has time to think, verify, and escalate when uncertain.
Klarna's success illustrates the critical distinction: they didn't deploy AI for instant-response live chat. They used it for ticket-based interactions where the system had time to:
- • Read full customer history
- • Cross-check policies and account details
- • Run multi-step reasoning
- • Escalate complex cases to humans
That's bottom-right quadrant deployment: high latency tolerance, manageable error cost, massive volume. Version 1 in the right context delivers. Version 1 in live-chat contexts? Seventy-two percent failure rate.
Version 2: 10-100x More Thinking at Same Problems
This is where the conversation gets interesting. Version 2 isn't about replacing people—it's about amplifying cognitive output beyond what was economically feasible before.
The fundamental shift: instead of sampling, you check everything.
Version 2: The Scale Play
What it is: Apply 10-100x more cognitive analysis to problems you already have
Value promise: Depth and coverage impossible with human-only teams
Examples: One analyst sampling 50 transactions → AI checks every transaction. Triaging 100 tickets/day → AI triages 10,000. Spot-checking CRM → row-level analysis overnight.
ROI reality: Companies achieving Version 2 report $3.70-$10.30 return per dollar invested (Source: Fullview AI Statistics 2025)
The Marginal Cost Revolution
Here's why Version 2 changes the game: the economics of additional thinking have inverted.
Old Economics vs New Economics
Human Cognition Model
- • Each additional hour of thinking = full hourly cost
- • Marginal cost scales linearly
- • Must choose: sample or go bankrupt
- • 50 transactions checked, 10,000 not
- • "We can't afford to analyze everything"
AI Cognition Model
- • Infrastructure cost is fixed; usage is variable
- • Each additional task = marginal inference cost only
- • Checking 10,000 vs 50 costs ~same
- • "Once plumbing is in, marginal cost → cents"
- • "We can afford to analyze everything"
This is the promise from Chapter 2 made real: marginal cost per "thinking task" trends toward cents instead of dollars. Version 2 is where you cash that promise.
What Makes Version 2 Work
Three conditions consistently predict success:
- High volume. The more cognitive tasks, the better the ROI. If you're only analyzing 50 things, the infrastructure overhead doesn't justify AI. If you're analyzing 50,000 things, the math tilts heavily in AI's favor.
- Latency flexibility. Batch or async workflows preferred. Overnight analysis, scheduled reports, queue-based processing. When you remove the real-time pressure, AI can apply depth that humans can't match at scale.
- Errors are reversible or caught by humans. You're not betting the company on a single AI decision. Either the work is low-stakes, or there's a verification step, or mistakes get caught downstream.
Notice the pattern: none of these replaced humans. They amplified what humans could oversee. The analyst still reviews flagged transactions. The billing specialist still signs off on complex cases. The CS team still handles escalations. But the cognitive reach of each human expanded 10-100x.
"We can throw 100x more thinking at our problems than we used to, for roughly the same spend. That's not automation—that's capability amplification."
Version 3: Thinking That Was Previously Impossible
Now we reach the frontier—the work that organizations don't even attempt today because it would be economically or politically insane.
Version 3 isn't about accelerating current work. It's about making entirely new categories of work rational to attempt for the first time.
Version 3: The Frontier
What it is: Work that was never feasible before because coordination overhead, calendar time, or organizational patience killed it
Why it's possible now: AI doesn't have calendar time constraints, meeting fatigue, political navigation requirements, or coordination overhead that scales with team size
Value unlock: Not incremental improvement—entirely new competitive capabilities that were structurally impossible before
Examples: Hyper sprints (Chapter 6), marketplace-of-one personalization (Chapter 7), continuous strategic sensing, exhaustive scenario planning
The Human Limits Version 3 Bypasses
Why are certain kinds of thinking structurally impossible with human-only teams? Three constraints that don't apply to AI:
The Coordination Tax
Research from organizational behavior shows a brutal pattern: as team size increases, productive output per person plummets.
5-person team
Efficient. Everyone knows what everyone else is doing. Decisions happen in one room.
10-person cross-functional team
Getting messy. Need alignment meetings. Different functions have conflicting priorities. 50% of time goes to coordination.
25-person project team
Organizational nightmare. More meeting time than work time. Decisions optimized for consensus, not quality. "Everyone only tolerates a certain amount of ideas because everyone's just keen to get the job done."
50-100 person initiative
Never attempted. No one even tries because it's organizationally impossible. The coordination cost exceeds any plausible benefit.
AI multi-agent systems don't have this problem. A 100-agent system doesn't need meetings, politics, or consensus-building. Coordination overhead stays flat.
The Calendar Time Trap
Large strategic initiatives take months with human teams. Not because the analysis itself requires months—but because of scheduling, coordination, iteration cycles, and political navigation.
A typical pattern for a major strategic decision:
- • Week 1-2: Frame the problem, align on objectives
- • Week 3-6: Parallel workstreams gather data
- • Week 7-8: Consolidate findings, inevitable gaps emerge
- • Week 9-10: Second round of analysis
- • Week 11-12: Draft recommendations, socialize with stakeholders
- • Week 13-14: Revisions based on feedback
- • Week 15-16: Final presentation, decision
That's four months for a decision that, in compute time, represents maybe 40-60 hours of actual analytical work. The rest is waiting—for meetings, for feedback, for availability, for political alignment.
AI doesn't wait. What if that same analytical depth happened overnight?
The Political Acceptability Filter
Here's the uncomfortable truth about large organizational decisions: they don't optimize for the best answer. They optimize for the politically acceptable answer.
"Those big teams working on something, they're never tasked with finding the best answer. It's always an acceptable answer. And what's an acceptable answer that the senior management will swallow? It's based on a lot of experience, sure, but it's experience of getting things past the senior managers."
Committee-think optimizes for consensus under time pressure. Bold ideas get watered down. Risky options get rejected not because they're wrong, but because no one wants to be the one who championed the failed initiative. The final recommendation is what the group can agree on, not necessarily what the analysis suggests.
AI search doesn't have that constraint. It can explore the full solution space—including options humans would self-censor for political reasons—and surface them with clear reasoning about trade-offs.
What Becomes Possible: Version 3 Examples
Hyper Sprints (Chapter 6)
Replace months of cross-functional committees with overnight AI exploration of thousands of strategic options, complete with reasoning trails and rejected alternatives. Human experts review in the morning and redirect for the next sprint.
Marketplace of One (Chapter 7)
Shift from segment-based strategies to per-customer personalization—offers, pricing, service levels, communications—economically rational for the first time because AI can manage the combinatorial complexity humans can't.
Continuous Strategic Sensing
Always-on analysis of all customer interactions, market signals, competitive moves, and internal operations—spotting emerging patterns no human analyst would catch because no one reads everything.
Exhaustive Scenario Planning
Stress-test every strategic option against hundreds of future scenarios, document why each succeeds or fails under which assumptions—analysis that would take a McKinsey team six months, delivered overnight.
The Version 3 Question
If you only ask "Where do we waste human thinking time?", you're stuck in Versions 1 and 2.
The Version 3 question is different:
"What's one project we've never attempted because the coordination overhead was too high? That might be our Version 3."
What strategic analysis have you not done—not because it wouldn't be valuable, but because assembling a 50-person team for six months was organizationally insane?
What per-customer customization have you not offered—not because customers wouldn't value it, but because managing that complexity manually would be impossible?
What patterns in your data have you not looked for—not because they wouldn't be revealing, but because no one has time to read everything?
Those are your Version 3 opportunities.
The Progression Table
Here's how the three versions compare across key dimensions:
| Dimension | Version 1 (Automation) |
Version 2 (Scale) |
Version 3 (Frontier) |
|---|---|---|---|
| What changes | Same work, fewer people | Same problems, 100x more thinking | New problems become rational |
| Value source | Cost reduction | Quality & coverage improvement | New capability creation |
| Success rate | 15-30% (unless right quadrant) | 60-75% | Unknown (frontier) |
| ROI range | Often negative → $2/dollar | $3.70-$10.30/dollar | Potentially 50x+ (strategic moats) |
| Time horizon | Immediate (months) | Near-term (6-18 months) | Strategic (2-5 years) |
| Competitive impact | Parity (everyone automates) | Advantage (better execution) | Moat (structural impossibility for others) |
| Org change required | Low | Medium | High (new operating models) |
| Example | Invoice processing chatbot | Fraud checking all transactions | Overnight strategic hyper sprints |
Where Most Organizations Are Stuck
The uncomfortable reality: most organizations are stuck in Version 1 because it's easiest to imagine. "AI, but inside the shapes of our current processes."
They look at their org chart, identify tasks humans currently do, calculate costs, and ask: "Can AI do this cheaper?"
That framing guarantees you miss Version 3 entirely—and explains why so many AI projects deliver underwhelming results.
Two Ways to Think About AI Projects
❌ The Version 1 Mindset
- • "Where do humans waste time?"
- • "What tasks can we automate?"
- • "How much can we save in headcount?"
- • Leads to: chatbots, RPA, task automation
Outcome: 70-85% failure rate, underwhelming ROI, "AI doesn't work for us"
✓ The Version 2-3 Mindset
- • "What if cognition was essentially abundant?"
- • "What analysis are we not doing because it's too expensive?"
- • "What was structurally impossible before?"
- • Leads to: exhaustive analysis, continuous sensing, new capabilities
Outcome: $3.70-$10.30 returns per dollar, strategic differentiation, compounding competitive advantages
The shift from Version 1 to Version 3 thinking requires asking a fundamentally different question.
What This Means for Your AI Strategy
The three-version framework gives you a lens to evaluate any AI project:
Version 1 Projects (Automation)
Treat with caution. Only proceed if:
- • You're in bottom-right quadrant (high latency tolerance, low error cost)
- • Volume is massive (10,000+ instances)
- • You're not competing with humans in contexts optimized for humans
Warning sign: If your pitch is "replace this person/team", expect 70%+ failure risk.
Version 2 Projects (Scale)
High confidence. Invest here. Look for:
- • Cognitive work you currently sample (now check everything)
- • Batch/async workflows with time flexibility
- • Opportunities to amplify human oversight 10-100x
Success pattern: $3.70-$10.30 return per dollar, humans stay in loop at critical points.
Version 3 Projects (Frontier)
Strategic bets. Start small, learn fast. Ask:
- • What analysis have we never done because coordination overhead was too high?
- • What per-customer customization can't we offer because complexity is unmanageable?
- • What strategic options have we never explored because it would take 50 people six months?
Strategic value: If successful, creates competitive moats competitors can't copy. Worth exploring even with uncertainty.
The Path Forward
Most organizations should pursue a portfolio approach:
- 10-20% Version 1: Low-risk automation wins in proven contexts (Klarna-style ticket handling)
- 60-70% Version 2: Scale plays with clear ROI—checking everything instead of sampling, amplifying human oversight
- 10-20% Version 3: Strategic exploration of previously impossible work—hyper sprints, marketplace-of-one, continuous sensing
The Version 1 projects keep the CFO happy with near-term savings. The Version 2 projects deliver measurable ROI and build organizational capability. The Version 3 projects create strategic separation from competitors.
"If you only look for wasted human thinking, you're missing the greenfield where there was no human thinking at all—because it was never feasible. Once you show executives that second territory, project ideas shift from 'let's bolt AI onto X' to 'what would we do if cognition was essentially abundant?'"
In the next two chapters, we'll make Version 3 concrete with detailed examples: hyper sprints that replace committee-think with systematic AI search (Chapter 6), and marketplace-of-one personalization that was never economically rational before (Chapter 7).
But first, let's be clear about what we've established:
Chapter 5 Key Takeaways
- • Version 1 (automation) has the highest failure rate—70-85%—because it competes with humans in contexts optimized for humans
- • Version 2 (scale) delivers $3.70-$10.30 per dollar by applying 100x more thinking to existing problems—check everything instead of sampling
- • Version 3 (frontier) enables work that was structurally impossible before—strategic analysis that would require 50 people for six months can happen overnight
- • Most organizations are stuck at Version 1 because they only ask "Where do we waste human time?" instead of "What was never feasible before?"
- • Klarna's $40M win shows Version 1 can work—but only in batch/ticket contexts with latency tolerance, not live chat
- • The marginal cost revolution: once infrastructure is in place, each additional "thinking task" costs cents, not dollars—changes what's rational to attempt
Hyper Sprints: Replacing Committee-Think
Picture the typical enterprise strategic project: ten cross-functional people, three months of meetings, and a PowerPoint deck that everyone can live with. Now imagine a different approach—one where thousands of possibilities are explored overnight, leaving humans to do what they do best: make the final call with full visibility into what was considered and why.
TL;DR
- • Committee-think optimises for consensus under time pressure, not for finding the best answer
- • Hyper sprints use AI to systematically explore thousands of possibilities overnight
- • Extended thinking and multi-model councils achieve 97% accuracy vs 80% for single models
- • Tasks that took cross-functional teams three weeks become 200+ iterations completed between midnight and 6am
The Committee-Think Problem
We've all seen how big decisions get made in organizations. A cross-functional team is assembled—representatives from finance, operations, marketing, IT, maybe legal. There's a series of workshops and meetings. Stakeholders negotiate. Political navigation happens throughout. The goal, whether stated explicitly or not, is rarely to find the best answer. It's to find an acceptable answer.
"What's an acceptable answer that senior management will swallow? Based on experience of getting things past managers, not exploration of what's actually optimal."
This isn't a criticism of the people involved—it's a structural constraint. Research on group decision-making reveals the mechanisms at play:
The Groupthink Mechanisms
Groupthink is a tendency to avoid critical evaluation of ideas the group favors. A poor or defective group decision influenced by groupthink is characterized by a failure to consider other, more favorable alternatives before reaching a conclusion.
— NYU Steinhardt, Groupthink as System
Research identifies four dimensions of defective decision-making:
1. Failure to create contingency plans
The group settles on one path without planning for what could go wrong
2. Lack of information search
Limited exploration of data that might challenge the emerging consensus
3. Biased assessment of costs and benefits
Evaluations skewed toward the preferred option
4. Incomplete consideration of options
Premature narrowing to a small set of alternatives
The outcome is predictable: committees aren't tasked with finding the best answer—they're tasked with finding an answer that key stakeholders will accept. The process becomes political theater, a negotiation between prior preferences rather than a genuine exploration of possibilities.
The Hyper Sprint Alternative
What if you could take a problem you'd normally hand to a cross-functional group for two to three months and replace it with something fundamentally different? Not a committee meeting, but a search process:
- → Thousands of AI calls overnight exploring multiple frames, scenarios, and constraints
- → Full audit trail of what was considered, rejected, and why
- → Human experts review in the morning and redirect the search based on insights
- → Politics happen after seeing the full landscape, not during exploration
The Chess Engine Analogy
Here's the crucial distinction: you're not asking AI to magically know the answer. You're asking it to systematically explore more possibilities than humans would have time for—like a chess engine exploring move trees.
In chess, the human sets the objectives (win the game), defines the constraints (legal moves), and provides the evaluation criteria (piece values, positional strength). The engine explores huge chunks of the possibility space. The result? Move sequences that humans would never have time to consider.
"Humans are terrible at exploring large idea spaces under time and social pressure. AI is good at it, as long as humans shape the scoring and constraints."— Scott Farrell
The same principle applies to strategic decisions: AI doesn't replace human judgment—it expands the search space so humans can judge from a position of visibility rather than guesswork.
Committee-Think vs Hyper Sprint
| Aspect | Committee-Think | Hyper Sprint |
|---|---|---|
| Optimises for | Consensus, political acceptability | Search coverage, idea quality |
| Time to explore | Constrained by meeting schedules | Unconstrained (overnight runs) |
| Ideas considered | Whatever fits in PowerPoint | Thousands of options explored |
| Audit trail | Sparse meeting notes | Full reasoning trail preserved |
| Politics | During exploration (constrains thinking) | After seeing landscape (informed) |
| Outcome | Acceptable to stakeholders | Best option identified, then negotiated |
Extended Thinking: The Enabling Technology
This kind of systematic exploration is only possible because of a fundamental shift in how AI systems work. Traditional language models optimize for speed—they generate the first plausible answer. But newer systems like OpenAI's o1 and Claude's extended thinking mode work differently.
Inference-Time Compute
o1 is trained with reinforcement learning to 'think' before responding via a private chain of thought. The longer it thinks, the better it does on reasoning tasks. This opens up a new dimension for scaling. We're no longer bottlenecked by pretraining. We can now scale inference compute too.
— IESE, Chain of Thought Reasoning Breakthrough
What this means in practice: instead of optimizing models to answer instantly, we can give them compute budget to think through the problem. The AI explores multiple paths, evaluates trade-offs, considers edge cases—all before committing to an answer.
Translation: a smaller model given time to think can outperform a massive model forced to answer immediately. This inverts the economics. Instead of needing exponentially more training compute to improve performance, you can allocate inference-time compute—which is cheaper and scales linearly.
Extended Thinking in Practice
Extended thinking is a feature of both Claude Opus 4.5 and Claude Sonnet 4.5 that enables the model to expose its internal reasoning process in 'thinking' content blocks. Unlike standard mode—which optimizes for brevity and speed—extended thinking allocates more compute and context to produce deeper, multi-step reasoning workflows, crucial for complex code refactoring, strategic planning, and legal analysis.
— Comet API, How to Use Claude Extended Thinking
The trade-off is latency. Extended thinking prioritizes reasoning quality over raw speed. For hyper sprints, that's exactly what you want—overnight batch runs where thoroughness matters more than instant responses.
Discovery Accelerator Architecture
So how do you actually implement a hyper sprint? The architecture we've developed—called a Discovery Accelerator—uses three layers to create transparent, multi-dimensional reasoning that single-model systems can't replicate.
Layer 1: Director AI
Role: Orchestration, framing, curation, adaptation
Function: Sets objectives, defines constraints, curates results, redirects search based on human feedback
Example: "We need a market entry strategy for Southeast Asia. Consider regulatory, competitive, and operational constraints. Prioritize speed to market but flag high-risk options."
Layer 2: Council of Engines
Role: Specialized models with diverse perspectives
Function: Different models debate approaches—one optimizes for cost, another for speed, another for risk mitigation
Example: Operations brain flags supply chain constraints, revenue brain identifies monetization paths, risk brain surfaces regulatory blockers
Layer 3: Chess-Style Reasoning Engine
Role: Systematic exploration, rebuttal generation, pruning
Function: Explores idea combinations, generates counter-arguments, prunes dominated options, documents reasoning
Example: Explores ~100 strategic nodes/minute—deliberately paced for observability rather than raw speed—preserving full audit trail of what was considered and why options were rejected
The Feedback Dimension: Stream-of-Consciousness Relay
Role: Continuous reasoning relay between layers
Function: The stream-of-consciousness output from Layer 3 feeds back into Layers 1 and 2. The Director sees emerging patterns and adjusts framing. The Council receives new evidence to update their perspectives. Each layer's reasoning enriches the others in real-time.
Why it matters: Traditional pipelines are one-directional—data flows down, results flow up. This architecture creates a reasoning loop: the chess engine's exploration surfaces insights that reshape how the Director frames the problem and how the Council weighs trade-offs. A rejected path in Layer 3 might reveal a constraint the Council hadn't considered, triggering re-evaluation across all specialists.
Example: Chess engine explores "aggressive pricing" path → discovers regulatory constraint in market X → relays finding back → Risk brain escalates concern → Director reframes objective to "sustainable entry" → Council re-debates with new constraint → Chess engine explores revised solution space
Why Multi-Model Councils
The diversity advantage isn't theoretical—it's measurable:
Multi-Model Performance:
- ✓ 97% accuracy with multi-model councils vs 80% for single models
- ✓ Implements Andrew Ng's four agentic design patterns: Reflection, Tool use, Planning, Multi-agent collaboration
- ✓ Diversity advantage is proven, not theoretical
Source: LeverageAI, Discovery Accelerator Architecture
Reasoning-Guided Search vs Traditional Search
The key architectural innovation is how search integrates with reasoning. Traditional AI search works like this:
❌ Traditional Approach: Search-Guided Reasoning
- User asks a question
- Search the web for relevant information
- LLM summarizes what was found
- Present summary to user
Problem: You get what the web happens to say about a broad topic, not targeted validation of specific strategic ideas. Generic information, not decision-relevant insights.
✓ Discovery Accelerator: Reasoning-Guided Search
- Chess engine generates specific strategic idea
- Generate targeted research questions FOR THAT IDEA
- Search for validation or contradiction of specific claims
- Feed findings back into scoring and exploration
Advantage: You search for what validates or challenges specific ideas, not generic information. Every search query is hypothesis-driven.
Source: LeverageAI, Reasoning-Guided Search
Use Cases for Hyper Sprints
Where does this approach make the most sense? Anywhere humans are currently forced to prematurely narrow the solution space due to time or cognitive constraints.
Strategic Planning
Portfolio Selection & Scenario Planning
"Which of these 200 possible projects are we under-valuing and why? What assumptions about the future make each one succeed or fail?"
Example: Generate 500 portfolio combinations, stress-test against 20 different market scenarios, document which assumptions drive each outcome.
Workforce & Network Design
Rostering problems with hundreds of constraints—skills, availability, costs, travel logistics, regulatory requirements.
Example: Explore 10,000 schedule permutations overnight, flag optimal solutions and document trade-offs between cost, coverage, and compliance.
The Non-Prune Advantage
Here's what changes when compute is abundant: you can afford to not prematurely prune the solution space.
Previously, strategic planning meant narrowing down early just to stay sane. You'd start with 50 ideas, immediately cut to 10 "finalists," and then spend months evaluating those 10. The problem? The best option might have been in the 40 you cut before doing any real analysis.
The New Pattern:
- → Generate many candidate strategies without premature filtering
- → Stress-test all of them against different future scenarios
- → Document why each fails or succeeds under which assumptions
- → Let humans decide with full visibility into the trade-offs
Engineering Optimisation
Resource Allocation with Complex Constraints
Budget allocation across departments with interdependencies, capacity limits, strategic priorities, and political realities.
Example: Model 5,000 allocation scenarios, identify Pareto-optimal solutions, show which strategic goals are in tension and require human trade-off decisions.
Project ROI Comparison
Compare dozens of potential initiatives across multiple metrics—financial return, strategic alignment, risk, time to value, resource requirements.
Example: Score 100 projects against 15 weighted criteria, generate sensitivity analysis showing how rankings change under different strategic priorities.
The Bottom Line
Committee-think isn't a failure of people—it's a failure of time and cognitive constraints. When you have limited meeting hours and finite human attention, you optimize for consensus and acceptability, not for finding the best answer.
Hyper sprints remove those constraints. AI doesn't sleep, doesn't need meetings, and doesn't suffer from groupthink. It explores thousands of possibilities systematically, preserves the full audit trail, and lets humans make decisions from a position of visibility.
"The chess analogy holds: AI explores, humans set objectives and decide. The game hasn't changed—we've just expanded the number of moves we can consider before committing."— Scott Farrell
Chapter Takeaways
- ✓ Committee-think optimizes for consensus under time pressure, not for finding the best answer. Groupthink, political navigation, and premature pruning are structural constraints, not people failures.
- ✓ Hyper sprints optimize for search coverage and idea quality. Thousands of possibilities explored overnight, full audit trail preserved, politics happen after seeing the landscape.
- ✓ Extended thinking enables deeper reasoning when given inference-time compute. Small model + thinking time can outperform 14× larger model + instant response.
- ✓ Multi-model councils achieve 97% accuracy vs 80% single-model. Diversity advantage is proven—specialized models debate from different perspectives (operations, revenue, risk, culture).
- ✓ The chess analogy holds: AI explores, humans set objectives and decide. You're not asking AI to know the answer—you're asking it to systematically explore more possibilities than humans have time for.
- ✓ Tasks that took cross-functional teams weeks become overnight runs with 200+ iterations. AI agents don't sleep, don't need coordination meetings, and don't suffer from groupthink.
Marketplace of One
Why do we segment customers? Because treating each one individually was too expensive. That constraint has changed.
The Historical Constraint
Why Segmentation Exists
For decades, the economics of marketing and service delivery forced a fundamental compromise: we segment customers into groups and design around the "average customer in segment X." It wasn't the ideal approach—it was the only feasible approach.
The reasoning was sound:
- • Too cognitively expensive to treat every customer individually
- • Policies designed around demographic group averages
- • Campaigns built for standardised segments
- • Support flows optimised for operational efficiency
What Gets Lost
The trade-off was predictable and painful:
- • Individual preferences flattened to segment averages
- • Outliers poorly served by standardised approaches
- • One-size-fits-most becomes one-size-fits-none
- • Opportunities for personalised value creation left on the table
$1 Trillion
The estimated value shift from standardisation to personalisation across US industries alone.
Companies that grow faster drive 40% more revenue from personalisation than their slower-growing counterparts. More than 70% of consumers now consider personalisation a basic expectation—not a premium feature.
McKinsey & CompanyThe Economic Shift
The constraint that justified segmentation—the prohibitive cost of individual treatment—has fundamentally changed. Research from McKinsey quantifies what many companies are beginning to discover: personalisation at scale represents one of the largest value-creation opportunities in modern business.
The Revenue Impact
The numbers are compelling:
- ✓ Personalisation typically drives 10–15% revenue lift
- ✓ Company-specific lift ranges from 5–25%, driven by sector and execution capability
- ✓ Shifting to top-quartile performance would generate over $1 trillion in value across US industries
"AI doesn't optimise for average—it adapts to context. Every interaction can be unique. Every path can be recalculated. Every response can be personalised."
The Cost Structure Flip
What has changed is fundamental: the economics of personalisation have inverted.
Previously: Customisation was expensive. Manual effort scaled linearly with customer count. Individual treatment required prohibitive human resources.
Now: Recomputing per-customer recommendations costs less than maintaining rigid, segment-based rules that inevitably require constant exception handling, manual overrides, and customer frustration.
The human couldn't manage the combinatorial complexity of thousands of individual customer profiles, each with unique history, preferences, and context. AI can track per-customer context, state, and behavioural patterns—and remain coherent.
What Becomes Possible
Per-Customer Design
When the constraint of cognitive overhead disappears, entirely new design patterns become rational:
Marketplace of One: Use Cases
Marketing
- • Per-customer campaign messaging
- • Dynamic creative optimisation
- • Individualised timing and channel selection
- • Offer personalisation beyond segment rules
Service
- • Personalised support flows
- • Individual escalation thresholds
- • Communication preference matching
- • Proactive outreach based on individual patterns
Pricing & Risk
- • Dynamic pricing based on individual behaviour
- • Personalised risk assessment
- • Custom terms and conditions
- • Individual credit decisions
Mass Personalisation vs Mass Customisation
The distinction matters:
- • Mass customisation: Caters to the needs of large user cohorts and their special requirements
- • Personalisation: Focuses on the needs of a particular individual
With advanced AI technology, achieving an intimate understanding of individual customer needs has become both realistic and financially promising.
IntelliasThe Results Data
Hyper-Personalisation Performance (2025)
Businesses leveraging AI-driven personalisation at scale are reporting dramatic performance improvements:
Compared to traditional segment-based approaches
AI Magicx, 2025What GenAI Enables
The capabilities that make marketplace-of-one feasible:
- → Real-time data analysis across vast customer datasets
- → Aspiration and behaviour identification—not just stated needs
- → Proactive trend anticipation and challenge resolution
- → True individualisation—moving beyond traditional customer segmentation
Always-On Sense-Making
One of the most powerful applications of marketplace-of-one thinking extends beyond customer-facing interactions:
- • Continuously reading tickets, emails, chats, documents, and logs
- • Spotting emerging problems, patterns, and opportunities as they develop
- • Proposing hypotheses: "It looks like X might be happening because of Y"
- • Acting as a standing "organisational brain" that never gets bored
You're not going to assign a human team to read everything, all the time—they'd mutiny. But an AI system can be a persistent sense-making layer across your organisation that never fatigues.
This Is Version 3
Not Automation
Marketplace of one represents Version 3 AI value creation:
- × Not Version 1: We're not doing old work faster
- × Not Version 2: We're not just applying more thinking to existing problems
- ✓ Version 3: We're creating new work that was never feasible—a new class of product and service design
What It Requires
Marketplace-of-one isn't plug-and-play. It requires genuine capability building:
Data Infrastructure
Systems to track, store, and retrieve individual customer context at scale. Not just transaction history—behavioural patterns, preference signals, and interaction context.
AI Systems
Models capable of processing per-customer recommendations in real-time or near-real-time. The ability to compute thousands of individualised responses efficiently.
Business Processes
Operational workflows that can receive and act on individual recommendations. Systems flexible enough to handle per-customer variation without breaking.
Governance
Frameworks for personalised decisions at scale. Ensuring fairness, compliance, and auditability when every customer receives unique treatment.
The Strategic Question
"What would we design if we had a smart assistant assigned to every customer—and every employee?"
The answers to that question represent the marketplace-of-one opportunity. They're the products, services, and experiences you currently dismiss as "too complex to manage." They're Version 3.
Key Takeaways
- • Segmentation exists because individual treatment was too expensive—that constraint has changed
- • $1 trillion opportunity in personalisation across US industries (McKinsey)
- • Cost structure has flipped: per-customer computing now cheaper than one-size-fits-none
- • Results: 62% higher engagement, 80% better conversion (AI Magicx)
- • Marketplace of one = Version 3: new work that wasn't feasible before
AI as Cognitive Exoskeleton
The pattern that works across all three versions of AI value isn't about replacement. It's about amplification.
AI does the pre-work. Humans own the moment.
TL;DR
- • Medical diagnostics improve from 72% to 80% sensitivity with AI assistance—not replacement, amplification
- • Multi-agent orchestration delivers 90.2% improvement over single-agent systems in research tasks
- • The cognitive exoskeleton pattern: AI saturates pre-work, human owns judgment and relationships
- • Token economics: multi-agent systems use 15x more tokens, so deploy on high-value tasks only
The Mental Model Shift
The difference between Version 1 and Version 3 thinking comes down to where you place AI in the workflow.
From Brittle Autonomy to Robust Augmentation
Old Mental Model
- • "AI answers the customer"
- • Fragile, one-shot, high failure rate
- • Fighting the latency-accuracy trade-off
- • 72% say chatbots waste time
- • Human escalation = system failure
New Mental Model
- • "AI does everything leading up to the moment where the human answers"
- • Robust, augmentative, plays to strengths
- • Human keeps judgment and relationships
- • 90.2% improvement with orchestration
- • Human escalation = system design
"The real power of AI lies in amplification, not automation. It doesn't remove human input—it multiplies its impact."— Anthony Coppedge, AI as Exoskeleton
The Pre-Work Pattern
Instead of asking AI to handle the customer interaction, ask it to prepare the human who will.
What AI Can Do Before the Human Acts
When a customer message arrives, AI can:
- → Mine CRM for truly relevant past interactions and summarize context
- → Infer what the customer probably cares about based on history and current message
- → Pull knowledge base articles, policies, and similar resolved cases
- → Surface a rich cockpit with context, suggested actions, draft responses, and risks to watch for
What This Gives the Human
Faster
Less clicking through screens and searching through documentation. The human agent sees a prepared dashboard instead of scattered data sources.
More Accurate
Better context than they'd find alone. AI can process thousands of past interactions to surface the three that actually matter for this customer's situation.
Still in Control
The human owns judgment and relationship handling. They see the AI's suggestions as inputs, not commands. They bring tacit knowledge, social intelligence, and recovery from error.
The Evidence for Augmentation
This isn't theory. The medical field provides the cleanest evidence that augmentation outperforms replacement.
Medical Results
Medical professionals using AI-enhanced diagnostics demonstrate significant performance improvements. Studies show AI assistance increasing diagnostic sensitivity from 72% to 80% and specificity from 81% to 85% for fracture detection, with 91.3% sensitivity for lesion detection compared to 82.6% for human-only interpretation. AI reduces diagnostic time significantly.
Source: EY, Human-Machine Economy report
Those numbers tell the story: AI alone isn't better than humans alone. But AI assisting humans beats either party working solo.
The Physical Exoskeleton Parallel
AI-powered exoskeletons and wearable robotics can augment human strength and endurance. The Exia model by German Bionic is the first AI-augmented exoskeleton: it captures billions of biomechanical data points, learns from user movements, and delivers up to 38 kg of adaptive lifting assistance.
Source: e-Novia, Human Augmentation Technologies report
The physical exoskeleton doesn't replace the human worker. It amplifies their capability. The same principle applies to cognitive work.
Brain Cache Research
Brain Cache, a Generative AI-powered cognitive exoskeleton acting as a second brain for humans, achieves cognitive augmentation through three mechanisms: externalizing biological memory, structuring knowledge, and activating insights. By creating a mirror system that externalizes, reorganizes, and reactivates knowledge in rhythm with biological learning cycles, we enable humans to consciously participate in their own cognitive evolution.
Source: MIT, GenAI & HCI Conference Paper
Multi-Agent Orchestration
The augmentation pattern doesn't stop at one AI assisting one human. It extends to teams of AI agents coordinating to amplify a single human's capability.
The Performance Data
We found that a multi-agent system with Claude Opus 4.5 as the lead agent and Claude Sonnet 4.5 subagents outperformed single-agent Claude Opus 4.5 by 90.2% on our internal research eval. Our internal evaluations show that multi-agent research systems excel especially for breadth-first queries that involve pursuing multiple independent directions simultaneously.
Source: Anthropic, Multi-Agent Research System
That 90.2% improvement isn't a typo. It's the difference between one AI trying to do everything and a coordinated team with specialized roles.
Enterprise Results
The pattern scales beyond research tasks. Enterprises deploying orchestrator-worker multi-agent patterns in sales, finance, and support see up to 30% increases in process efficiency, with error rates reduced by up to 25%.
How It Works
A central orchestrator agent uses an LLM to plan, decompose, and delegate subtasks to specialized worker agents or models, each with a specific role or domain expertise. This mirrors human team structures and supports emergent behavior across multiple agents.
Multi-Agent Architecture Pattern
Orchestrator Agent
- • Receives task from human
- • Decomposes into specialized subtasks
- • Delegates to worker agents
- • Synthesizes results back to human
Worker Agents
- • Each has domain expertise (sales, legal, technical, etc.)
- • Execute narrow, well-defined tasks
- • Return results to orchestrator
- • Can be different models optimized for cost/performance
Human Role
- • Sets strategic direction
- • Reviews synthesized options
- • Makes final judgment calls
- • Handles relationships and accountability
Token Economics Reality
In our data, agents typically use about 4× more tokens than chat interactions, and multi-agent systems use about 15× more tokens than chats. For economic viability, multi-agent systems require tasks where the value of the task is high enough to pay for the increased performance.
Source: Anthropic, Token Economics Research
This is the trade-off: 15x token cost for 90.2% performance improvement. It makes economic sense for high-stakes decisions (M&A analysis, strategic planning, complex technical architecture). It doesn't make sense for routine email summaries.
Where This Pattern Applies
The cognitive exoskeleton pattern works anywhere humans face high-stakes, time-sensitive interactions that benefit from exhaustive preparation.
Live Customer Interactions
- → Sales calls: AI surfaces account history, suggested talking points, competitor intel, pricing options
- → Support: AI prepares context, root cause analysis, suggested solutions, escalation triggers
- → Negotiations: AI analyzes options, precedent deals, risk scenarios—human makes judgment calls
Internal Workflows
- → Approvals: AI prepares cost-benefit analysis, risk assessment, compliance check—human signs off
- → Risk reviews: AI surfaces patterns across thousands of transactions—human makes assessment
- → Strategic decisions: AI explores 100+ options systematically—human chooses direction
Professional Services
- → Medical consults: AI prepares differential diagnosis context, relevant research, similar cases
- → Legal work: AI surfaces relevant precedent, contract language, risk flags—lawyer makes judgment
- → Financial advice: AI prepares portfolio analysis, scenario modeling, tax implications
The Common Thread
In all these cases:
- • The interaction is high-stakes and time-sensitive
- • AI saturates the pre-work and side-work
- • The human owns the moment of judgment
- • Relationships and accountability stay with the human
Not Brittle Autonomy
The cognitive exoskeleton pattern succeeds where chatbot autonomy fails because it isn't forcing AI into a one-chance-to-be-perfect role.
What Each Party Brings
What Humans Bring
- • Tacit knowledge
- • Social intelligence and relationship handling
- • Judgment under ambiguity
- • Recovery from error
- • Accountability
What AI Brings
- • Exhaustive search and retrieval
- • Pattern matching at scale
- • Consistent application of criteria
- • Parallel processing across data sources
- • Tireless attention to detail
Each party plays to their strengths. AI doesn't need to be perfect because it isn't making the final call. Humans don't need to manually search through thousands of records because AI has already done that work.
The result: faster, more accurate, and more robust than either party working alone.
Chapter Takeaways
- → Shift from "AI answers" to "AI does everything leading up to the answer"
- → Medical evidence: 72% to 80% diagnostic sensitivity with AI assistance—8 percentage point improvement from augmentation
- → Multi-agent orchestration delivers 90.2% improvement over single-agent systems
- → Token cost reality: multi-agent = 15x chat cost, so deploy on high-value tasks only
- → The exoskeleton pattern applies everywhere: sales, support, medical, legal, strategic planning
- → Not replacement—amplification. AI brings exhaustive search and pattern matching; humans bring judgment and accountability
The Right Questions
You've seen the deployment matrix. You understand the three versions of AI value. You know why chatbots fail and where batch processing wins. Now comes the implementation reality check.
Because "$1 per hour AI" is a fantasy. And the questions most organisations ask point them directly toward the 95% failure rate.
The Real AI On-Costs
Per-token pricing creates the illusion that AI is cheap. GPT-4o costs $10 per million output tokens. That sounds like pennies. But AI doesn't run on tokens alone—it requires an ecosystem to operate. The true costs look less like hiring a contractor and more like hiring a department.
Toolchain & Infrastructure
What you need: Retrieval systems (vector stores for RAG), orchestration platforms to coordinate multi-step workflows, observability and logging infrastructure to track what's happening.
Reality check: These aren't optional. Without them, you're flying blind.
Operations & Monitoring
What you need: Logs, alerts, dashboards showing token consumption and costs, anomaly detection for model drift, performance monitoring across the stack.
Cost example: In research contexts, monthly spend on evaluation and tooling ranges from $31,300 to $58,000. Enterprises need to alert if budgets deviate by more than 5%.
Governance & Risk
What you need: Policies defining which models can be used and when, approval workflows for high-stakes deployments, audit records for regulators and compliance teams.
Reality check: Risk officers start sweating when they can't trace how a decision was made.
Model & Prompt Maintenance
What you need: Ongoing prompt tuning as models and products evolve, model version management and testing, workflow updates as business processes change.
Reality check: Workflows decay. Products change. Prompts that worked last quarter may not work next quarter.
Change Management
What you need: Training staff on new workflows, updating standard operating procedures, handling organisational politics when AI changes how people work.
Reality check: Technology is the easy part. People are the hard part.
This is why successful AI organisations don't just buy software—they build capabilities. And that takes investment in infrastructure, process, and people.
What Success Actually Looks Like
The 70/30 Split
Organisations achieving value from AI invest 70% of their AI resources in people and processes, not just technology. Change management is as important as model selection. The companies treating AI as purely a software purchase are the ones showing up in the failure statistics.
Technology is 30% of the solution. The other 70% is people, process, governance, and organisational change.
ROI Timeline Reality
Most organisations achieve satisfactory ROI within 2-4 years—much longer than typical 7-12 month software payback periods. Companies that moved early into GenAI adoption report $3.70 in value for every dollar invested, with top performers achieving $10.30 returns per dollar. But that return takes patience. Quick wins need to fund a longer journey.
High Performer Characteristics
AI high performers share common patterns:
- • They commit 20%+ of digital budgets to AI
- • They implement human oversight for critical applications
- • They set growth or innovation as objectives, not just efficiency
- • They redesign workflows rather than bolting AI onto existing processes
Half of AI high performers intend to use AI to transform their businesses, and most are redesigning workflows. They're not asking "where can we swap humans for AI?" They're asking "what becomes possible when thinking scales?"
The Risks Beyond Hallucinations
Hallucination gets all the press. The model makes something up. Everyone panics. Risk committees demand guardrails. But hallucination is just a symptom—a model making something up once, in a way that's visible.
The deeper risks are systemic, silent, and far more dangerous.
"Hallucination is just a model making something up once. The real risk is an AI system being wrong consistently and invisibly for months."
This is why observability isn't optional. This is why governance frameworks matter. This is why the on-costs are real. You're not just deploying a model—you're deploying a system that needs monitoring, maintenance, and accountability structures.
The Three Questions
Most organisations start AI projects by asking the wrong question. Here's how the progression should actually work.
Decision Path: From Wrong to Best
❌ Wrong Question
"Where can we put a chatbot?"
- • Technology-first thinking
- • Ignores the deployment matrix entirely
- • Leads directly to the 95% failure rate
Outcome: Pilots that don't scale, frustrated users, abandoned initiatives.
⚠️ Better Question
"Where do we waste human thinking time on work that's slow, repetitive, or queued up?"
- • Identifies Version 2 opportunities (100x thinking applied)
- • Finds the batch/queue sweet spots on the deployment matrix
- • Focuses on proven ROI patterns
Outcome: Efficiency gains, measurable ROI, incremental transformation.
✓ Best Question
"What thinking have we never even attempted because the coordination overhead was too high?"
- • Identifies Version 3 frontier (previously impossible work)
- • Finds opportunities that transform capability, not just efficiency
- • Points toward strategic differentiation
Outcome: New capabilities, competitive advantage, business transformation.
The wrong question leads to chatbot failures. The better question leads to efficiency gains. The best question leads to transformation. Most organisations never get past the first question—which is why most AI projects fail.
3 Questions to Ask Before Any AI Project
1. Where on the 2×2?
Map your use case: Latency tolerance × Error cost
- • If left side (low latency tolerance): human-led, AI-assisted only
- • If bottom-right (high latency tolerance, low error cost): prime AI territory
2. Which version of value?
- • Version 1: Same work, fewer people (highest failure rate)
- • Version 2: More thinking at same problems (proven ROI)
- • Version 3: Previously impossible work (transformative frontier)
3. What's the on-cost reality?
Infrastructure, monitoring, governance, maintenance, change management
Can this use case bear a 2-4 year ROI timeline with 70% of investment in people and process?
Implementation Best Practices
Key success factors include starting with high-impact processes, investing in change management, ensuring data quality, and planning for continuous improvement rather than one-time implementation.
Start Right
- • High-impact, low-complexity processes: Don't start with your hardest problem
- • Clear ROI that can be measured: Define success metrics before you begin
- • Business value focus: Not technology showcase
The Success Pattern
Tools that succeeded shared two traits: low configuration burden and immediate, visible value. In contrast, tools requiring extensive enterprise customisation often stalled at pilot stage.
- • Embed in workflows, adapt to context
- • Scale from narrow but high-value footholds
- • Avoid requiring extensive setup before users see value
The Shift
The organisations that succeed with AI aren't the ones with the biggest technology budgets. They're the ones who changed how they think about the problem.
From Version 1 Thinking
- ✗ "Where can we automate?"
- ✗ Competing with humans
- ✗ Technology as cost reduction
- ✗ 95% failure rate
To Version 3 Thinking
- ✓ "What becomes rational when cognition is abundant?"
- ✓ New capability creation
- ✓ Technology as capability infrastructure
- ✓ Transformative potential
This shift—from AI as expensive experiment to AI as infrastructure for thinking—changes everything. It changes budget conversations. It changes risk assessment. It changes which projects get greenlit and which get killed.
Most importantly, it changes the questions you ask.
The Question to Take Away
So here's the question worth asking yourself:
"What's one project you've never attempted because the coordination overhead was too high?"
Because it would take too many people. Because it would require too much time. Because the meetings alone would sink it. Because the cognitive overhead of keeping everyone aligned would consume more energy than the work itself.
That project—the one you've never attempted—might be your Version 3.
Not because AI will do it for you. But because AI changes the coordination economics. It changes what's rational to attempt. It changes the threshold where "too expensive to think about" becomes "worth exploring."
"Once you show them that second bucket, project ideas stop being 'let's bolt AI onto X' and start becoming 'what would we do if cognition was essentially abundant?'"
That's the shift. That's the opportunity. And that's the question worth answering.
Ready to Identify Your Version 3?
The organisations transforming with AI aren't the ones with the biggest budgets. They're the ones asking better questions.
Start with the deployment matrix. Map your use cases. Identify where you're wasting thinking time. Then ask what you've never attempted—and explore whether AI changes the economics enough to make it rational.
Scott Farrell helps organisations move from Version 1 automation thinking to Version 3 capability building. Connect on LinkedIn to explore how AI changes what's possible for your organisation.
Key Takeaways
- • AI on-costs are substantial: infrastructure, monitoring, governance, maintenance, and change management all add up
- • Success requires 70% investment in people and process, with 2-4 year ROI timelines
- • The real risks aren't hallucinations—they're systemic wrongness and silent drift
- • Wrong question: "Where can we put a chatbot?" Better question: "Where do we waste thinking time?" Best question: "What thinking was never feasible before?"
- • Start high-impact/low-complexity, redesign workflows, focus on business value not technology showcase
- • The shift from "AI as automation" to "AI as infrastructure for thinking" changes which projects become rational to attempt
References & Sources
This ebook draws on enterprise AI research, industry surveys, academic studies, and practitioner insights compiled in late 2024 and early 2025. Where statistics or frameworks are cited, the primary source is noted. The author's interpretive frameworks integrate patterns observed across multiple engagements and sources.
Primary Research & Global Surveys
McKinsey Global Survey on AI, November 2025
State of AI adoption statistics, high-performer characteristics, workflow redesign data
https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
MIT Media Lab: "The State of AI in Business 2025"
95% AI pilot failure rate analysis, P&L impact measurement, workflow alignment findings
https://complexdiscovery.com/why-95-of-corporate-ai-projects-fail-lessons-from-mits-2025-study/
S&P Global Market Intelligence: Enterprise AI Survey 2025
42% abandonment rate, POC-to-production gap data, regional adoption patterns
https://workos.com/blog/why-most-enterprise-ai-projects-fail-patterns-that-work
RAND Corporation: AI Project Failure Analysis
AI projects fail at 2× rate of non-AI tech, root cause analysis, misalignment patterns
https://www.rand.org/pubs/research_reports/RRA2680-1.html
Google Cloud ROI Study, September 2025
74% of executives achieved ROI within first year, adoption benchmarks
https://www.punku.ai/blog/state-of-ai-2024-enterprise-adoption
Consulting Firms & Industry Analysis
BCG: Enterprise AI Capabilities Assessment (Late 2024)
4% cutting-edge adoption, 74% yet to show tangible value despite investment
https://agility-at-scale.com/implementing/roi-of-enterprise-ai/
McKinsey: Personalisation Value Analysis
$1 trillion US market shift, 10-15% revenue lift, 40% revenue advantage for fast growers
https://www.mckinsey.com/capabilities/growth-marketing-and-sales/our-insights/the-value-of-getting-personalization-right-or-wrong-is-multiplying
Gartner: AI Software Spending Forecast
$300 billion by 2027, CFO accountability expectations
https://agility-at-scale.com/implementing/roi-of-enterprise-ai/
NTT Data: GenAI Deployment Failure Analysis
70-85% failure rate findings, retail vs custom AI tools comparison
https://www.nttdata.com/global/en/insights/focus/2024/between-70-85p-of-genai-deployment-efforts-are-failing
Technical Research & AI Systems
Anthropic: Multi-Agent Research Systems
90.2% performance improvement with orchestration, token economics (4× agents, 15× multi-agent)
https://www.anthropic.com/research/building-effective-agents
IESE: Chain-of-Thought Reasoning Breakthrough
Extended thinking, test-time compute scaling, small model outperforming 14× larger models
https://blog.iese.edu/artificial-intelligence-management/2024/chain-of-thought-reasoning-the-new-llm-breakthrough/
Hugging Face: Test-Time Compute Analysis
Inference-time reasoning, o1 architecture insights
https://huggingface.co/blog/Kseniase/testtimecompute
MIT GenAI & HCI: Brain Cache Cognitive Exoskeleton
AI as second brain, cognitive augmentation mechanisms
https://generativeaiandhci.github.io/papers/2025/genaichi2025_51.pdf
Galileo AI: Real-Time vs Batch Monitoring for LLMs
Latency-accuracy trade-offs, false positive rate analysis
https://galileo.ai/blog/llm-monitoring-real-time-batch-approaches
Zen van Riel: Real-Time vs Batch Processing Architecture
40-60% cost savings analysis, infrastructure comparison
https://zenvanriel.nl/ai-engineer-blog/should-i-use-real-time-or-batch-processing-for-ai-complete-guide/
Customer Experience & Chatbot Research
Forbes / UJET: Chatbot Frustration Survey
72% "waste of time", 78% escalate to human, 63% no resolution
https://www.forbes.com/sites/chriswestfall/2022/12/07/chatbots-and-automations-increase-customer-service-frustrations-for-consumers-at-the-holidays/
Johns Hopkins Carey: Hurdles to AI Chatbots
Gatekeeper aversion, priority queue solutions
https://carey.jhu.edu/articles/hurdles-ai-chatbots-customer-service
Nature: Consumer Trust in AI Chatbots
Service failure attribution, anthropomorphism effects
https://www.nature.com/articles/s41599-024-03879-5
WorkHub: Top 7 Reasons Chatbots Fail
Weak escalation protocols, incomplete knowledge bases
https://workhub.ai/chatbots-fail-in-customer-service/
Human Augmentation & Medical AI
EY: Human-Machine Hybrid Economy
Diagnostic sensitivity improvement (72% → 80%), AI-enhanced medical performance
https://www.ey.com/en_us/megatrends/how-emerging-technologies-are-enabling-the-human-machine-hybrid-economy
AI Magicx: Hyper-Personalisation at Scale (2025)
62% engagement increase, 80% conversion improvement
https://aimagicx.com/blog/hyper-personalization-ai-customer-experiences-2025/
e-Novia: Human Augmentation Physical AI
German Bionic Exia case study, cognitive-physical augmentation parallels
https://e-novia.it/en/news/human-augmentation-technologies-physical-ai-industry-healthcare/
Group Dynamics & Decision-Making
NYU Steinhardt: Groupthink as System
Four dimensions of defective decisions, cohesion-performance relationship
https://wp.nyu.edu/steinhardt-appsych_opus/groupthink/
ANZSOG: Effective Committee Work
Time constraints on knowledge sharing, preference negotiation patterns
https://anzsog.edu.au/app/uploads/2022/06/10.21307_eb-2018-002.pdf
ASPPA: Investment Committee Groupthink
Committee dynamics impact on portfolio performance
https://www.asppa-net.org/news/2019/5/how-investment-committees-can-avoid-groupthink/
Case Studies & ROI Examples
Klarna AI Assistant Case Study
$40M annual benefit, 700-agent equivalent workload, ticket-based (not live chat) deployment
https://www.articsledge.com/post/ai-software-business
Nasdaq Data Quality Implementation
90% reduction in time on data quality issues, $2.7M savings
https://www.montecarlodata.com/blog-ai-observability/
Enterprise AI Investment Breakdown
Average $6.4M annual spend across software, talent, infrastructure, training
https://www.secondtalent.com/resources/ai-adoption-in-enterprise-statistics/
LeverageAI / Scott Farrell
Practitioner frameworks and interpretive analysis developed through enterprise AI transformation consulting. These materials inform the conceptual frameworks presented in this ebook.
Discovery Accelerators: The Path to AGI Through Visible Reasoning Systems
Three-layer architecture (Director AI, Council of Engines, Chess-Style Reasoning), reasoning-guided search patterns
https://leverageai.com.au/wp-content/media/Discovery_Accelerators_The_Path_to_AGI_Through_Visible_Reasoning_Systems_ebook.html
Stop Replacing People, Start Multiplying Them: The AI Augmentation Playbook
Augmentation flywheel concept, week-by-week transformation patterns
https://leverageai.com.au/wp-content/media/Stop_Replacing_People_Start_Multiplying_Them_The_AI_Augmentation_Playbook_ebook.html
The Team of One: Why AI Enables Individuals to Outpace Organisations
Multi-agent performance data, marketplace-of-one economics, solopreneur capability analysis
https://leverageai.com.au/wp-content/media/The_Team_of_One_Why_AI_Enables_Individuals_to_Outpace_Organizations_ebook.html
Stop Automating. Start Replacing.
Cost structure flip concept, per-customer economics analysis
https://leverageai.com.au/wp-content/media/Stop_Automating_Start_Replacing_ebook.html
The Agent Token Manifesto
Hypersprint concept, overnight iteration patterns, agent economics
https://leverageai.com.au/wp-content/media/The_Agent_Token_Manifesto.html
The AI Think Tank Revolution
Multi-agent reasoning systems, specialised AI council patterns
https://leverageai.com.au/wp-content/media/The_AI_Think_Tank_Revolution_ebook.html
Note on Research Methodology
This ebook synthesises research from multiple source categories: global enterprise surveys (McKinsey, BCG, S&P Global), academic research (MIT, NYU, Johns Hopkins), industry analysis (Gartner, NTT Data), and practitioner insights. Statistics and quotations are attributed to their primary sources throughout the text.
The author's frameworks—including the "Maximising AI Cognition and AI Value Creation," the 2×2 deployment matrix, "hyper sprints," "marketplace of one," and "cognitive exoskeleton" concepts—represent interpretive synthesis developed through enterprise AI consulting engagements. These are presented as the author's analytical lens rather than as external research findings.
Research compiled: November–December 2025
Note: Some linked resources may require subscription access. URLs were verified at time of publication but may change.