Skip to content
AI-Native PM
7 min · 0 of 8 in Context & Memory

Why hallucinations are context failures, not model failures

Your support bot answers a refund question with complete confidence: returns accepted within 60 days, full refund, no questions asked. The policy changed last quarter to 30 days, and the customer posts a screenshot of the bot's answer next to the checkout page that contradicts it. In the postmortem someone says the model hallucinated, the team swaps in the newest model, and the sharper demo answers settle the argument. Within days the same 60-day promise lands in another customer's inbox, screenshotted again. Nothing about the swap touched the help-center export from last year that retrieval still serves on every policy question, so each new model reads the same stale document and answers from it, more fluently each time.

This chapter opens the part of The Frontier about what your product knows and remembers, and it starts with the most familiar failure in AI products, which usually gets blamed on the wrong component.

One failure, three costumes: fabrication, staleness, and loss

A hallucination, the industry word for a confident answer with no basis in fact, arrives wearing one of three costumes, and telling them apart is the first diagnostic skill of this part.

  • Fabrication: the model produced an answer with no facts in front of it. Ask a research assistant for papers on a narrow topic and it can return a citation with a plausible author list and a real-sounding journal for a paper that does not exist. Nothing about any real paper sat in its context, so it produced the most statistically likely text for the request.
  • Staleness: the model was handed facts that were once true. The support bot above is the canonical case. Retrieval worked, the document was real, and the 60-day policy in it was accurate on the day it was exported. From inside the window, a fact that was true last year reads exactly like a fact that is true today.
  • Loss: the fact was present and fell out. A meeting notetaker processes ninety minutes in which the group reverses a decision during the final ten, and the summary reports the early version, because the reversal either fell outside what the summarizing call was given or sat so deep in a long transcript that it carried no weight. Research on long contexts backs this up: models use facts placed at the start and end of a long window far more reliably than facts buried in the middle.

All three arrive in the bug tracker as "the model got it wrong," and each demands a different fix, none of which is a different model. The fabricating research assistant needed a real literature index to draw on, the support bot needed last quarter's policy in place of last year's, and the notetaker needed the reversal kept within reach of the final summarizing call.

A hallucination is usually a supply problem: the model produced an answer without the facts it needed in front of it.

Fix the supply of facts, not the model

Of everything in your product, the model is the component you control least. The provider trains it, versions it, and retires it; you pick from a menu, a decision we covered in Choose a model you can live with. The context is the opposite. Every token the model reads on a call is there because your product put it there: the sources you connected, the records the product stored, the documents retrieval selected, the instructions you wrote. When an answer goes wrong, the model is the part you can only swap, and the context is the part you can actually fix.

That asymmetry is why the reframe pays off. A team that treats hallucinations as model failures rides the upgrade treadmill and keeps its bugs, while a team that treats them as context failures repairs the supply of facts and watches all three costumes fade together, because each one is a version of the wrong material arriving in the window.

The knowledge supply chain

To repair the supply, it helps to see it as a chain. A fact lives at a source (a policy page, a CRM record, a meeting transcript), gets prepared for lookup in an index (the structure that makes a large pile of documents searchable in an instant), gets picked at question time by retrieval, lands in the window next to the user's question, and comes out in the answer. Each costume is a break at a specific link, which is what makes the classification useful.

  • Fabrication breaks at the source or at retrieval. The fact was never captured anywhere the product can reach, or the lookup came back empty and the product answered anyway.
  • Staleness breaks at the source or the index. The fact was captured once, the world moved on, and nothing re-captured it.
  • Loss breaks at the window. The fact arrived and was pushed out by newer material, or sat in the middle of a long context where models use it least reliably.

Every grounded answer travels a knowledge supply chain, from source to index to retrieval to window to answer, and each kind of hallucination is a break at one specific link.

Context engineering: deciding what the model sees

The industry has settled on a term for managing this supply: context engineering, which in plain words is deciding what the model sees on every call. The term covers two kinds of work, and this part teaches one of them. The plumbing half, embeddings, chunk sizes, vector databases, belongs to your engineers, and the chapters ahead give you just enough of it to hold your end of that conversation. The decisions half belongs to you: which documents count as the truth, what the product stores about a user and for how long, which fact wins when two sources disagree, how much of the window each kind of material deserves, and what evidence proves an answer stood on its facts. Those are product decisions with customer-visible consequences, and they are yours to make.

If you worked through Give the model the facts it wasn't trained on, you have already run the entire supply chain by hand, pasting the document into the prompt yourself. This part builds the version that runs without you, at production volume.

What this part covers

The chapters follow the supply chain link by link. Retrieval: let the product look things up before it answers builds the basic lookup loop, and The search stack: match the retrieval method to the question chooses the machinery behind it. Memory: decide what your product remembers covers what the product stores across sessions, Context budgets: fit the right facts into a finite window manages the window itself, and Freshness and conflicts: govern the knowledge you answer from keeps the sources current and consistent. Grounding evals: prove the answer stands on the facts turns grounding into evidence you can check, and the part closes when you write your Knowledge Charter and ship a product that knows its facts.

Try it now

This drill takes about 15 minutes and produces the diagnostic move you will use for the rest of the part.

Get your specimen. Pick one wrong answer you have personally seen a real AI product give with confidence. If nothing comes to mind, elicit one: ask a product you use about its own newest feature, released in the last month or two, and there is a fair chance the answer describes the product as it stood when the model was trained.

Reconstruct what the model could have seen. List what plausibly sat in the window on that call: the system instructions, any retrieved documents, the conversation up to that point. You are estimating from the outside rather than auditing logs, and the estimate is enough.

Classify the costume. If nothing relevant could have been available, call it fabrication. If something relevant was available but outdated, call it staleness. If the correct fact appeared earlier in the conversation or clearly lives in the product's own records yet went missing from the answer, call it loss.

Locate the broken link. Place the failure on the chain: source, index, retrieval, window, or answer, with one sentence of reasoning next to it.

Write the supply fix. One sentence stating what the supply chain should have delivered and how, in the spirit of "re-export the help center on a weekly schedule" or "retrieve the release notes before answering any feature question." Keep the sentence, because the next chapter starts building the machinery that executes it.

Chapter Summary

  • A hallucination is a confident answer produced without the facts it needed, and it is usually a supply failure rather than a model failure.
  • The failure wears three costumes: fabrication (no facts in front of the model), staleness (facts that were once true), and loss (the fact was present and fell out of reach).
  • The model is the component you control least, and the context is the component you control most, so fixing the supply of facts fixes all three costumes at once.
  • Every answer travels a knowledge supply chain, from source to index to retrieval to window to answer, and each costume is a break at a specific link.
  • Swapping models leaves a broken supply chain broken; read the exact context a failing answer was built from before you blame the model.
  • The work of deciding what the model sees on every call is called context engineering, and this part teaches its decision half, the product calls rather than the plumbing.
  • The first link to build is lookup, which is where Retrieval: let the product look things up before it answers begins.

Sources

  • Lewis, P., Perez, E., Piktus, A., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020.
  • Liu, N. F., Lin, K., Hewitt, J., et al. (2023). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics.
  • Anthropic engineering blog (2025). Effective context engineering for AI agents.
  • Anthropic and OpenAI context window and retrieval documentation (last verified July 2026).
Marks this chapter complete on your course map. Reaching the end does this for you.