Why AI projects are failing – explained

Created on 2025-10-10 10:07

Published on 2025-10-10 10:15

TL;DR

Only 5% of enterprises consistently extract AI value; 42% abandon most projects after POC (BCG, S&P Global)
The culprit: running 2005 procurement against 2025 technology, plus vendors selling “AI-washing”
The fix: replace “requirements → RFP → install” with “hypotheses → experiments → operating model” and composable architecture

FULL eBook : https://leverageai.com.au/wp-content/media/The_Death_of_Shelf_Software_and_the_Rise_of_Composable_AI_ebook.html

You’re Not Imagining It

If your organisation has bought AI software only to watch staff spend more time correcting it than using it, you’re in good company—miserable company, but not alone.

BCG’s 2025 study found only ~5% of companies are consistently getting measurable AI value. The rest are dabbling, stalling, or scaling noise. S&P Global reports 42% of firms have ditched most of their AI projects. Gartner warned a year ago that the majority of gen-AI pilots die after proof-of-concept, and regulators have started prosecuting “AI-washing” claims—a signal the market incentives are deeply misaligned.

Two forces are colliding: enterprises are running an old procurement playbook against a fundamentally new technology, and many vendors are slapping AI badges on yesterday’s software.

Why the Old Playbook Fails

Traditional software procurement worked because requirements were stable and vendor products were mature. You’d write an RFP, evaluate three shortlisted platforms, negotiate a contract, and deploy over 6–18 months. The software did what the spec promised, or at least close enough.

AI doesn’t work that way. Models are perishable—GPT-4 becomes obsolete when GPT-5 ships. Use cases emerge through experimentation, not requirements gathering. And “the AI” isn’t a thing you install; it’s a capability you compose, instrument, and retrain continuously.

When you freeze requirements and sign a three-year lock-in, you’ve ossified your capability the moment you bought it.

The vendor problem

Legacy vendors started integrating AI 18 months ago when models were poor and patterns were immature. By the time their products ship today, the “AI” inside is already two generations behind what you see in the press. Some didn’t even upgrade the product—they just rebranded existing features as “AI-powered.” The FTC and SEC are now actively swatting these claims.

Four Mental Shifts for AI That Actually Helps

1. From “requirements → RFP → install” to “hypotheses → experiments → operating model”

Treat each AI use case as a falsifiable hypothesis with a measurable baseline: time saved, cost reduced, quality improved. Before anything touches production, run small instrumented trials with an evaluation harness—gold test data, offline scoring, shadow deployments. If the AI can’t beat the baseline, kill it fast.

This isn’t agile theater. It’s scientific method applied to software buying.

2. From vendor features to capability audits

Stop chasing feature lists. Audit these instead:

Data access and quality: Can the vendor work with your actual data, or only sanitised demos?
Eval methodology: Do they provide offline test results on tasks like yours, with failure modes disclosed?
Safety and controls: Can you route low-confidence cases to human review automatically?
Model swap-out policy: When GPT-6 arrives, can you upgrade without renegotiating your contract?
Observability: Can you inspect prompts, inputs, outputs, costs, and drift in real time?

Demand model cards, red-team results, and proof the “AI” isn’t just brand paint. Gartner is now flagging “agent-washing” as a category of risk.

3. From blanket human review to risk-based oversight

“Check everything” kills productivity; “check nothing” invites disaster. The answer is triage: route low-risk, high-confidence cases straight through, and escalate only when confidence drops, stakes rise, or anomalies trigger.

Many firms report early losses from flawed outputs and compliance misses—classic symptoms of no triage. Smart oversight isn’t a full-time human in every loop; it’s conditional escalation based on risk.

4. From off-the-shelf vs bespoke to “composable bespoke”

Pure bespoke drowns you in maintenance. Pure shrink-wrap leaves value on the table. The pragmatic centre:

Buy the undifferentiated plumbing: identity, data platforms, observability, security, policy enforcement.
Assemble with open, swappable model adapters so tomorrow’s model replaces today’s without a rewrite.
Compose thin, task-specific micro-apps or agent workflows around your data and processes. Spin them up for a project, throw them away after, but they live on a solid platform.

Leaders in the 5% cohort are doing versions of this—embedding AI into core workflows, not stapling it onto old ones.

What This Implies for Buying and Building

Procurement becomes continuous. Models are perishable. Structure contracts and architecture so you can swap components quarterly without renegotiating your soul.

POCs aren’t the goal; payback is. Gate funding on time-to-first-value and unit economics—dollars saved per 1,000 requests after infrastructure and review costs. Gartner and S&P’s failure stats come from organisations that scaled proofs, not value.

Change management is product work. The win comes from redesigned workflows, new roles (AI product owner, prompt evaluator), and training. BCG’s “future-built” firms invest heavily here, and it’s the difference between demo magic and durable ROI.

Quick Sniff-Tests to Avoid AI-Washing

Can the vendor show offline evals against your tasks, with failure modes disclosed?
Is there a confidence signal and policy routing (when to escalate to humans)?
What’s the model exit plan? (“We use X” without an adapter is a lock-in trap.)
Where’s the observability—prompts, inputs, outputs, costs, drift—and who owns it?
Are they promising headcount cuts before they can demonstrate consistent quality? The FTC is watching those claims.

FAQ

Q: Isn’t this just “build everything yourself”? A: No. You buy the platform and plumbing. You compose the task-specific layer. It’s faster and cheaper than pure bespoke, and far more adaptive than shrink-wrap.

Q: What if we’ve already signed a three-year vendor deal? A: Negotiate quarterly model refresh clauses, or architect an adapter layer so you can swap the back-end AI without ripping out the vendor.

Q: How do we know if our use case is even viable? A: Run a one-week eval with gold test data. If the AI can’t beat your baseline (current process time, cost, or quality), don’t scale it.

Q: Who owns this in the organisation? A: Typically a product manager with AI literacy, supported by engineering (for eval harnesses) and business owners (for baselines and success metrics). It’s not an IT project.

Bottom Line

Your instinct is right: the old RFP-then-install ritual is mismatched to AI’s pace and variability, and a lot of “AI” on offer really is yesterday’s product with today’s buzzwords.

But “rip it all out and go bespoke” is a false binary. The durable pattern is composable bespoke on a stable foundation—treat AI as a living capability with experiments, guardrails, and swappability baked in.

The 5% who are winning didn’t buy better AI. They bought differently.

What’s one procurement assumption you’re rethinking after reading this? Drop it in the comments.

Sources: BCG AI Value Gap (2025), S&P Global AI Adoption Study, Gartner Hype Cycle for GenAI, FTC AI-Washing Crackdown

Discover more from Leverage AI for your business

Subscribe to get the latest posts sent to your email.

Why AI Projects Are Failing – Explained