You are halfway through a compliance review when the auditor asks how long the company keeps customer data. You type the question into the internal assistant your team ships, and the answer comes back confident, cited, and wrong: it quotes the 2024 data-retention policy, eighteen months, section number and all. The current policy, signed in early 2026 and cutting retention to twelve months, sits two folders away in the same document store, and both versions have been in the index for months. Nothing malfunctioned. Retrieval matched a relevant document, the model produced an answer grounded in the passage it was handed, and the citation checks out. Nobody ever said which version wins, so a superseded policy just answered a legal question in front of an auditor.
That failure is not a retrieval bug, and no amount of tuning the search stack fixes it. Retrieval: let the product look things up before it answers gave your product a way to pull facts at answer time, and every chapter since has assumed those facts are sound. An index, though, is an inventory of claims about the world, the world keeps moving after ingestion, and an ungoverned inventory rots.
Every source needs an owner, a shelf life, and a recall path
When a team first wires up retrieval, ingestion feels like the finish line: the documents go in, the answers improve, and everyone moves on. From that day forward, though, the index serves whatever it was fed, while prices change, policies get re-signed, and the product drifts away from its own documentation. Governing the index means keeping that inventory honest, and it comes down to three questions you ask of every source before it enters.
- Who owns it? A person, with their name on the source list, who can say whether the document is still right and has the authority to pull it. If nobody owns a source, nobody pulls it when it goes wrong.
- How long does it stay trustworthy? Every document has a shelf life, and the only choice is whether you set it at ingestion or discover it during an audit.
- What happens when two sources disagree? The retention failure in the opening scene was not a missing document but a missing rule, because both versions were present and nothing said which one wins.
Every source needs an owner, a shelf life, and a recall path, decided when it enters the index rather than after it fails an audit.
Freshness SLAs: give every content class a shelf life
You cannot re-review every document forever, and you do not have to, because documents rot at rates set by the kind of content they hold. Give each content class a freshness SLA (a service-level agreement, here meaning a hard deadline by which a document must be re-verified or retired), and let the deadlines decide what gets reviewed.
- Policies, prices, and anything contractual rot in weeks. A stale answer here creates legal or financial exposure, so the SLA is short: re-verify monthly, or on every signing event, whichever comes first.
- Product docs rot in a release cycle. They go wrong the day a release changes the behavior they describe, so tie their SLA to your release rhythm rather than the calendar and re-verify the affected pages as part of shipping.
- Foundational explainers rot in years. A document on what an API key is, or how your industry's core workflow runs, stays true across releases, so an annual pass is enough.
The mechanics should be boring. Stamp every document with its ingestion date and its class the moment it enters the index, run a job that flags anything past its deadline, and route each flag to the owner. Expiry by class replaces the review of everything, and it keeps the total review load small enough to actually happen.
Conflict rules: decide who wins before two sources disagree
Two documents in the same index will eventually make incompatible claims, and the moment they do is the wrong time to pick the winner, because by then the wrong answer has usually shipped. Write the conflict rules first, and keep them simple enough to audit.
- Newest wins by default. For most classes the most recently verified document is the right one, and the date stamps you already added resolve these conflicts automatically.
- Keep an override list for the classes where newest is wrong. A draft policy is newer than the signed policy it is written to replace, and a customer email is newer than the contract it contradicts. For those classes, designate the authoritative source (the signed document, the system of record) and let it beat anything newer until it is formally replaced.
- Make the tiebreak visible. When the product answers from one of two conflicting documents, log which rule picked the winner, and where the stakes justify it, say so in the answer ("per the retention policy signed February 2026"). A tiebreak the team can see gets corrected when it picks wrong.
The recall path: one action that removes a bad document everywhere
Sooner or later a document turns out to be wrong rather than merely stale, and someone has to pull it. The dangerous assumption is that deleting it from the index finishes the job, because a document that has lived in a production system has copies: retrieved passages sit in caches, popular answers built on it may be stored whole, and as Context budgets: fit the right facts into a finite window showed, compaction folds retrieved facts into summaries that outlive the documents they came from.
A recall counts only when one action clears the document from the index, the caches, and every derived summary that absorbed it, because the model answers from whichever copy the product hands it.
So build the recall path as a single action, a script or a runbook, before you need it: remove the source document from the index, invalidate any cache that could hold its passages or answers built on them, and regenerate or expire the compacted summaries that ingested its claims. Then test it the way you test backups, by recalling a harmless document and confirming that no path still produces its content.
The operating cadence: a short review with a name on it
Governance survives only when it runs as a routine. Put a short recurring review on the calendar for the highest-rot classes, policies and prices most months and product docs at each release, and give it a single owner whose name is on the source list, because a review owned by "the team" is owned by nobody. The session stays short by design: work the flag queue, confirm or retire the expired documents, and check that the conflicts logged since last time resolved under the right rule.
This cadence is the activity we call "Govern the knowledge" in The Operating Manual, and its outputs, the source list with owners, the SLA table, the override list, and the recall runbook, drop straight into the Knowledge Charter you will complete at the end of the part. The index-refresh mechanics of your particular stack live in the Retrieval Stack Sheet, because those details change faster than a chapter should.
Try it now
This drill takes 30 to 45 minutes and produces the first rows of your Knowledge Charter.
Pull your document set. Take the actual set your product answers from, or if you have not shipped retrieval yet, the folder your team treats as its reference library.
Scale it down: if the set runs past about fifty documents, govern the twenty most retrieved (or most asked about) first; a governed core is worth more than an ungoverned archive.
Date-stamp everything. For each document, record the date of its last substantive edit, not the file's modified date, which changes whenever someone fixes a typo.
Assign a class and a shelf life. Sort each document into policy-grade, release-cycle, or foundational, give each class its SLA, and flag every document already past its deadline.
Write one conflict rule. Find the class where two documents currently disagree (there is nearly always one) and write the rule that picks the winner: newest wins, or the authoritative source that beats newest.
Write the recall steps for your riskiest source. Pick the document whose wrongness would cost the most and list every place its content could live: index, caches, stored answers, compacted summaries. That list is your recall runbook, and each step should name the command or the person that executes it.
Chapter Summary
- A knowledge base is an inventory of claims about the world, and the world keeps moving after ingestion, so an ungoverned index rots.
- Ask three questions of every source when it enters the index: who owns it, how long it stays trustworthy, and what happens when it disagrees with another source.
- Give every content class a freshness SLA: policies and prices rot in weeks, product docs in a release cycle, foundational explainers in years.
- Date-stamp documents at ingestion and expire them by class, so review effort goes where the rot is fastest.
- Write conflict rules before conflicts arrive: newest wins by default, an override list for the classes where the authoritative source beats newest, and a tiebreak the team can see and audit.
- Build the recall path as one action that clears the index, the caches, and the compacted summaries, and test it before you need it.
- Keep governance alive with a short recurring review of the highest-rot classes, owned by a person whose name is on the source list.
- Governed facts still need proof that the answers actually stand on them, and Grounding evals: prove the answer stands on the facts builds that proof.
Sources
- DAMA International (2017). DAMA-DMBOK: Data Management Body of Knowledge, 2nd edition. Technics Publications.
- Uber Engineering (2024). Genie: Uber's Gen AI on-call copilot. Uber engineering blog.
- Azure AI Search documentation on indexer schedules and index refresh (last verified July 2026).