Capture Was Never the Bottleneck

SF Scott Farrell July 5, 2026 scott@leverageai.com.au LinkedIn

SMB Knowledge · AI Strategy

Capture Was Never the Bottleneck

She wrote down every question her staff asked, and every answer she gave, for ten years. Hundreds of pages. Her staff still asked. What failed wasn’t her discipline — it was a step that didn’t exist yet.

Scott Farrell · LeverageAI · leverageai.com.au · 9 min read

📚 Read the full field guide

The companion field guide: why shared drives rot by design, the read-only fix, the dental story in full, manuals that are never stale, role-shaped views, seed wikis, the split-screen copilot, and the floor where trust is won. The SMB Knowledge Play: Compile the Business You Already Wrote Down →

TL;DR

  • Knowledge management fails at compilation, not capture. Writing answers down was never the missing step — compiling them into something that answers back was.
  • A repeated question is a cache miss, not a comprehension failure. Staff aren’t stupid; the owner transmitted knowledge once, into a mailbox, unindexed — and billed the retrieval failure to their intelligence.
  • Ten years of logged questions isn’t a monument to failure — it’s demand-side telemetry: a frequency-weighted map of what your business actually needs to know, waiting to be compiled.

The Word document

A few years back I was building phone agents for a dental practice. The owner — one of the most organised people I’ve ever worked with — went to great lengths to tell me how many times she had to repeat herself to staff. It had worn her down so thoroughly that she’d started running a weekly meeting just to batch the questions, because she hated getting them all week. Then she’d sit in the meeting and feel it happening anyway: she was answering the same questions she’d already answered. Last month. Last year. Five years ago.

So she did what a disciplined person does. She started writing them down. Every question a staff member asked, and the answer she gave, went into a Word document.

When we started on the phone agents, she said: “I know everything it needs to answer” — and sent me the document. She had been keeping it for over ten years. It ran to hundreds and hundreds of pages.

And here is the detail that makes the story worth telling: her staff still asked her the questions. The document grew for a decade, and the weekly meeting never got shorter. She did everything the knowledge-management playbook says to do — capture diligently, write it all down, keep it in one place — and it changed nothing.

Capture was never the bottleneck. Compilation is. She did everything right except the one step that didn’t exist yet.

It would be easy to file this as a story about one unusually persistent practice owner. It isn’t. She just produced the cleanest evidence I’ve ever seen of a failure that is running, right now, in almost every small business — usually without the document to prove it.

Whose failure is it?

Ask the average owner why staff keep asking questions they’ve already answered, and you’ll get a diagnosis about people: they don’t listen, they don’t read, they don’t retain. She wondered the same thing — how could they still not know, when she’d answered it, in writing, sometimes several times?

But run the failure backwards and it looks very different. The knowledge was transmitted once — into an email, a meeting, a Word document on a shared drive. From the staff member’s side, the experience is: you emailed it to me, and it’s stuck in my inbox. You did tell me — I just have no way to find it again. There is no structured way back to the answer. The only reliable index to ten years of accumulated answers was the woman who wrote them.

The reframe

A repeated question is a cache miss, not a comprehension failure. The organisation failed to serve the answer — and billed the failure to the asker’s intelligence.

Staff aren’t stupid. In software terms, every one of those repeated questions is a cache miss: the answer exists, the lookup fails, so the request falls through to the slowest, most expensive backend in the building — the owner. Every re-explanation is the organisation paying interest on a missing map.

And the pattern is universal enough to have numbers attached. In one benchmark of 1,500 office workers, 83% said they had recreated a document that already existed because they couldn’t find it, and 68% said finding the most recent version of a document is a struggle.1 McKinsey’s classic estimate puts knowledge workers at roughly 1.8 hours a day searching and gathering information — the equivalent of hiring five people and having four show up, while the fifth spends the week looking for answers.2 Panopto measured the specifically human version of the failure: employees lose 5.3 hours every week either waiting for knowledge from colleagues or recreating knowledge that already exists.3

Read those numbers again with the dental practice in mind. “Waiting for vital information from a colleague” is a polite, enterprise-survey way of describing a receptionist ringing the owner on her day off to ask which code to put through the HICAPS machine. The cost of a missing map doesn’t show up as a line item. It shows up as the owner never really being off.

She was the retrieval layer

Look at what the practice actually built, in systems terms. There was a corpus: ten years of questions and answers, plus the usual sediment — emails, procedure notes, documents on the shared drive. And there was a query interface: her. Staff asked; she retrieved. She held the index in her head, resolved ambiguous questions, knew which of three contradictory answers was current, and served results in seconds.

She was the practice’s retrieval layer. Human RAG.

She burned out on being the retrieval layer and started logging the cache misses instead of fixing the cache.

That’s the precise, unsentimental description of the Word document. It’s a cache-miss log — a record of every time the practice’s knowledge infrastructure failed to serve an answer and the request fell through to her. Logging the misses felt like progress, because capture is visible and effortful and virtuous. But a log of misses doesn’t fix a cache. Appending page 400 to a document nobody can read doesn’t change what happens when the next new receptionist needs the cancellation policy.

Here’s the uncomfortable generalisation: your business is running the same architecture. Maybe without the document — most owners never get that disciplined, which is why her artifact is so valuable — but with the same human retrieval layer. If you are the person who gets asked, you are the index. The weekly “quick questions”, the interruptions, the calls on your day off: that’s what it feels like to be a query interface with no cache in front of you.

The missing step has a name

What she built was a raw corpus. What she needed was a knowledge base. The difference between those two things is a step almost nobody names, so here it is: compilation.

Capture (what she did)
  • Append every Q&A to the document
  • No de-duplication — the same question answered eleven times, eleven ways
  • No superseding — the 2016 answer and the 2024 answer sit side by side, both looking current
  • No index, no map — findable only by the person who wrote it
Compilation (the missing step)
  • Merge duplicates into one canonical answer
  • Chain versions — the current answer on top, history preserved underneath, dated
  • Resolve contradictions, or flag them for the one person who can
  • Build the map — so the answer is reachable by someone who doesn’t know what it’s called

Compilation is what turns hundreds of pages nobody can read into something that answers back. No human was ever going to do it — reading a decade of accumulated Q&A, de-duplicating it, deciding which of seven versions of the payment-plan answer is current, cross-referencing it against the procedures in the shared drive — that’s weeks of expert-grade tedium, which is exactly why it never happened, in her practice or anyone else’s. Until very recently, reading at that scale had no affordable unit price. Now it does. Machine comprehension became cheap enough to run over a whole business’s accumulated text, which means the step that “didn’t exist yet” now exists.8

One honest boundary: not everything the practice knows was ever written down. Some expertise lives in hands and habits — Polanyi’s old line is that we know more than we can tell.4 Compilation can only compile what left a trace. But that boundary is much further out than it looks: after ten years, the traces are everywhere — the answers are in the document, the exceptions are in the email threads, the procedures are in the attachments. The expert’s job stops being answer every question, forever, serially and becomes review the compiled draft and correct it where it’s wrong. Editing is an order of magnitude cheaper than re-answering.

“But we have search”

The standard objection: this is what search is for. She could have used Ctrl-F. The practice could have bought a search tool, or one of the copilots now bundled into every office suite.

Search fails here for a reason worth being precise about. Search answers “where is the thing I can name?” — and the person who can name the thing was never the problem. The new receptionist doesn’t know the document calls it “gap payment variance”; she knows a patient is upset at the front desk. The deeper failure of every shared drive is that you don’t even know what to look for. No search box fixes that, because the failure happens before the query is typed.

And retrieval tools have a second, quieter limitation: they find; they never conclude. Point one at the practice’s corpus and it will faithfully retrieve all eleven answers to the deposit question — the 2016 one, the 2019 one, the one that changed after the incident nobody wrote up — and leave “which of these is current?” exactly where it has always lived: with the person asking. Microsoft’s own governance guidance for Copilot concedes the point — answers degrade when the underlying content is stale or duplicated, and users “may receive outdated results”; the prescribed fix is to clean up your content first.5 The tool inherits the mess; it doesn’t resolve it. I’ve written elsewhere about why this is structural for every vendor copilot — Every Copilot Is Myopic — but the practical test is one sentence:

The one-sentence tool test

Does it conclude, or does it just find? If “which one is current” is still your problem after the tool answers, you bought a librarian’s trolley, not a librarian.

Deciding which of seven answers is canonical isn’t retrieval. It’s synthesis — judgement, applied once, recorded with its receipts. That’s precisely the work compilation does up front, so nobody has to redo it at nine on a Tuesday with a patient at the counter.

The document was worth more than she knew

Now the redemption arc, because the ten years weren’t wasted — she just mislabelled the asset.

She thought she’d written an answer book, and as an answer book it failed. But read the document a different way and it’s something rarer: ten years of real questions, frequency-weighted. Which topics come up every month. Which policy staff have never once absorbed from the manual. Which questions spike with every new hire, and which only appear when a particular machine misbehaves. That’s not a knowledge base. That’s telemetry — a decade of measurements of where the practice’s knowledge demand actually lands.

Most knowledge bases are built supply-side, from what the owner thinks matters. She accidentally recorded the other half.

Anyone who has ever written documentation knows the supply-side trap: you document what seems important to you, the expert — and the manual answers questions nobody asks while missing the ones everybody does. Her document inverts that. Every entry exists because a real person, doing the real job, actually needed it — needed it badly enough to interrupt the boss. Compile a knowledge base with that log in hand and you know exactly which twenty pages are load-bearing, which answers must be bulletproof and current, and which hundred pages of dutiful procedure can stay thin. The questions are the map of what matters.9

Run the arithmetic that log implies, even roughly. A practice with eight staff, each losing anything like the surveyed 5.3 hours a week to waiting-for-or-recreating knowledge,3 is paying for a phantom part-timer whose entire job is asking and re-answering — before counting the owner’s side of every exchange, and before counting the churn tax: dental front-desk turnover runs 30–40% a year, roughly double the all-industry average, so the practice re-pays the whole onboarding cost of those questions with nearly every new face at the desk.6 A decade of that is not a rounding error. It’s one of the largest unbilled line items in the business.

What compiling actually looks like

This is the part that has changed since she started typing. The compilation step is now a build process you can run over the business you already have — without moving a single file:

Compilation, concretely

  • Leave everything where it lies. The drive, the inboxes, the Q&A document — nothing migrates, nothing gets renamed, nobody re-files a decade of history. The compiled layer holds meaning and pointers; the originals stay put. Turn it off, and nothing broke.
  • Read it all once. Machine comprehension does the weeks of expert-grade tedium: merge the eleven deposit answers, chain the versions by date, keep the superseded ones visible underneath instead of letting them ambush search results.
  • Conclude, with receipts. Where judgement is genuinely needed — which of seven versions is canonical — the compiled answer records why (newest, most-referenced, authored by the process owner) and links back to the documents it read. The rare hard calls go to the one person who can make them: a ten-minute review, not a weekly meeting.
  • Let the questions steer it. The cache-miss log — hers, or the one hiding in your sent mail — tells the compiler which answers carry the practice. New questions keep tuning it. The thing finally learns from being asked.9

The result demos in one sentence: ask anything about your business, get the answer, and see the documents it read. The receipts matter more than the fluency — an answer that shows its sources is one a sceptical owner (rightly sceptical, given what she’s been sold before) can check in one click.

The pitch that writes itself

Which brings us back to the owner, because the commercial version of this story is almost embarrassingly short. For any small business that has been operating for a few years, the pitch is:

You’ve already answered every question your staff will ever ask — probably several times. We compile the answers into something that answers back.

No months of workshops. No “knowledge audit”. No asking the team to change how they file things — the filing was never going to change, and with compilation it doesn’t need to. The raw material is already there, sitting in the exhaust of a decade of just running the business: the emails, the shared drive, the meeting notes and, if you’re lucky, a Word document lovingly maintained by the one person disciplined enough to log every cache miss for ten years.

She’d have signed on the spot — ten years and several hundred pages ago.

And the questions won’t stop when the map exists, by the way. They just stop landing on you. The new receptionist still won’t know the HICAPS code — day one, nobody does. The difference is that the question gets answered by the compiled practice, in seconds, with the source attached, instead of by the owner, on her day off, for the eleventh time. Staff were never the problem. Now they get an infrastructure that finally treats them that way.

Where this goes next

The Word document is one corpus and one interface — questions in, answers out. The full play for a small business goes further: the same compiled map can write the onboarding manual (and keep it current), shape itself to each role at the practice, and eventually sit beside staff while they work. And underneath it all sits a harder question this article deliberately skipped: why shared drives rot in the first place — no matter how disciplined the team is. That diagnosis, and the read-only fix, is where the ebook begins.

Read the field guide: The SMB Knowledge Play

The companion ebook works the whole play end to end — why shared drives rot by design (one parent, many truths), the read-only compilation layer that fixes them without a migration, the dental story in full, manuals as build outputs that are never stale, role-shaped views of one map, seed wikis for verticals, the split-screen copilot, and why the floor — not the ceiling — is where trust is won. The ebook link is in the post above. If you’re the person your business can’t stop asking, that’s exactly the work we do at LeverageAI.

References

  1. [1]M-Files. “2019 Intelligent Information Management Benchmark Report” (n=1,500). — “More than eight in ten (83%) say that they’ve had to recreate a document which already existed because they were unable to find it on their corporate network… Over two-thirds of respondents (68%) stated that it’s either always, mostly or sometimes difficult to find the right version of a document.” www.project-consult.de/wp-content/uploads/2019/04/M-Files_IIM_Benchmark_Report_2019.pdf
  2. [2]McKinsey Global Institute (social economy report figure, via Cottrill Research). — “Employees spend 1.8 hours every day—9.3 hours per week, on average—searching and gathering information. Put another way, businesses hire 5 employees but only 4 show up to work; the fifth is off searching for answers, but not contributing any value.” cottrillresearch.com/various-survey-statistics-workers-spend-too-much-time-searching-for-information
  3. [3]Panopto. “Workplace Knowledge and Productivity Report” (2018). — “U.S. knowledge workers waste 5.3 hours every week either waiting for vital information from their colleagues or working to recreate existing institutional knowledge,” at an estimated $47M per year in lost productivity for large firms. www.prnewswire.com/news-releases/inefficient-knowledge-sharing-costs-large-businesses-47-million-per-year-300681971.html
  4. [4]Polanyi, Michael. “The Tacit Dimension” (1966). — “We can know more than we can tell.” en.wikipedia.org/wiki/Tacit_knowledge
  5. [5]Microsoft. “What’s new in Content Governance in SharePoint, OneDrive, and Teams for the AI era” (2025) and SharePoint Copilot guidance. — “Users of Copilot may receive outdated results generated from content on inactive sites”; “If your SharePoint environment is cluttered with stale content or overshared files, Copilot’s responses—and your security posture—will suffer.” techcommunity.microsoft.com/blog/spblog/what%E2%80%99s-new-in-content-governance-in-sharepoint-onedrive-and-teams-for-ai-era/4411645
  6. [6]Resonate, citing DentalPost 2025 Salary Survey and industry data. — “Dental practices experience 30-40% annual turnover among front desk and administrative staff, representing roughly double the national industry average… costing practices up to $26,000 per departure.” www.resonateapp.com/resources/dental-front-desk-staffing-statistics
  7. [7]Gartner (via Research World). — “Unstructured data represents an estimated 80 to 90 per cent of all new enterprise data… growing three times faster than structured data.” researchworld.com/articles/possibilities-and-limitations-of-unstructured-data
  8. [8]Farrell, Scott. “The Index Is the Data: How a Self-Cleaning Wiki-Graph Out-Thinks RAG.” LeverageAI — the compilation machinery: pre-process the corpus off-cycle into claims and edges so retrieval becomes navigation of a pre-built map rather than a query-time crawl. leverageai.com.au/the-index-is-the-data-how-a-self-cleaning-wiki-graph-out-thinks-rag/
  9. [9]Farrell, Scott. “File Back the Walk.” LeverageAI — a query is a write in disguise: answered questions and their paths are telemetry that improve the map, which is why a decade of logged questions is the demand-side half of a knowledge base. leverageai.com.au/file-back-the-walk/

Discover more from Leverage AI for your business

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2026 Leverage AI, Scott Farrell. All rights reserved. This content is made available on a limited, revocable, read-only basis only. No licence or right is granted to copy, reproduce, republish, scrape, store, adapt, summarise, index, embed, or use this content to create derivative works, work product, deliverables, methodologies, training materials, prompts, templates, software, services, research, or commercial outputs, whether by humans or machines, without prior written permission. This restriction includes internal business use, client work, consulting, advisory, implementation, and any use in or for artificial intelligence, machine learning, data extraction, retrieval, evaluation, fine-tuning, or knowledge-base construction.