Retrieval: let the product look things up before it answers · The Builder's Stack

It starts with contextPick the right search

Your research assistant product handles competitive questions well enough that the sales team starts leaning on it. Then a rep asks what a rival charges for its enterprise tier, pastes the answer into a proposal, and the prospect corrects it in the meeting, because the rival cut that price months ago. The product answered from the model's training data, which froze sometime last year, and nothing in the answer hinted that the number could be stale. The fix your team ships is not a smarter model or a sterner prompt. It is one lookup step, a tool call that fetches the rival's live pricing page before the model writes a word, and the same model that embarrassed you in that meeting now quotes the current number.

Why hallucinations are context failures, not model failures made the case that most wrong answers happen because the right facts never reached the request. This chapter covers the standard fix: a retrieval step that lets the product look things up before it answers. There are two ways to build that step, and this chapter treats the choice between them as product work rather than plumbing.

Retrieval is a product decision about when the system checks the record instead of answering from training data.

Two architectures: search first, or hand the model a search tool

One-shot retrieval searches before the model runs. Your code takes the user's question, runs it as a query against your documents, pastes the best matches into the request, and the model answers once from what it was handed. This is the classic pipeline the industry calls RAG, short for retrieval-augmented generation: retrieve first, then generate, in that fixed order, with the model playing no part in the search.

Agentic retrieval moves the search inside the model's turn. The request includes a search tool, and the model controls when the tool is called, takes what comes back as new input, and can search again with a better query when the first results miss. The loop ends when the results support an answer or a step limit cuts it off. The engineering accounts behind agentic research products describe exactly this pattern: the first query is a rough guess, and the follow-up queries, rewritten after real results come back, are what find the answer.

Notice what moved between the two halves of the diagram. In one-shot, your code owns the search and the model consumes whatever the search found. In agentic, the model runs the search, and your code owns the tool, the step limit, and the bill.

The tradeoffs you own: cost, predictability, and stakes

The two architectures pull in opposite directions on cost, predictability, and the questions they can handle, so treat the choice as a roadmap decision.

One-shot is cheap, fast, and predictable. Every answer costs one search plus one model call, latency stays flat, and because the retrieved passages sit in the logs next to every answer, checking whether the answer used them is straightforward.
Agentic handles the questions one-shot fumbles. A vague question ("are we exposed to the new EU rules?") and a multi-hop question (one whose answer needs a second search that depends on what the first search found) both defeat a single fixed query, and a model that can re-query after the first results come back works through them. The price is several searches and model calls per answer, and less predictability, since two runs of the same question can take different paths to different passages.
A workable decision rule: match the architecture to the question type and the stakes. Direct questions whose answer sits in one findable document get one-shot. Vague, comparative, or multi-hop questions get agentic, and so does any question where a wrong answer costs enough to justify paying for follow-up searches. Many products run both, with one-shot as the default and agentic as the escalation.

When agentic retrieval fans out, the bill multiplies the way a fleet's does, every extra search paying for its own tokens, and Economics: what a fleet costs and when it pays prices that multiplication in full.

When not to retrieve

Retrieval is not a default, and three cases argue for skipping it entirely.

The facts fit in the prompt. If your whole refund policy runs two pages, paste it into every request and be done; Give the model the facts it wasn't trained on covers this simplest form of grounding, and no search can beat it on the material it fits.
The knowledge never changes. Definitions, stable standards, and settled method are the material training data holds well, so test whether the bare model already handles them reliably before wiring a lookup in front of them.
The path is latency-critical. Autocomplete in an email assistant or live suggestions in a meeting tool cannot wait for a search round trip, and a lookup that doubles response time can cost more in abandonment than it earns in accuracy.

Retrieval fails in three ways users can see

Once you ship a retrieval step you have shipped a feature, and it fails like one. That matters for your roadmap because each failure mode needs its own detection and its own fix, and each reaches the user in a different disguise.

A retrieval step is a product feature with its own failure modes, not a box you check on the way to launch.

Found nothing. The search returns no useful passage, and the model either answers from training data anyway, which reproduces the stale-pricing failure retrieval was meant to end, or declines, which the user reads as an unhelpful product.
Found the wrong thing. The search returns a real document that does not answer the question, the way a support bot pulls the retired policy instead of the current one, and the model builds a confident answer on it. This is the most dangerous mode, because the answer cites something genuine and looks better than a bare guess.
Found too much. The search returns pages of near-matches, the useful passage sits buried in the noise, and the user sees slower, vaguer, and more expensive answers.

Try it now

This drill produces the evidence behind every retrieval decision in this part, and it runs on your own documents with your everyday tools.

Get your questions. Write down ten real questions your users ask, or ten questions about a documentation set you have on hand. Mix direct lookups with at least a couple of vague or multi-hop questions.

Run the bare model. Ask all ten with no retrieval and save the answers.

Run one-shot retrieval. Upload the same documents to your provider's file-search feature, which runs the search-then-answer pipeline for you, and ask the ten questions again. The Retrieval Stack Sheet keeps the current tool names and pricing.

Run agentic retrieval. Ask the ten questions a third time with a hosted search tool the model calls itself, over the same documents where your tooling allows it, or over the live web where it does not.

Score all thirty answers. Mark each one found-right, found-wrong, or did-not-look, and note beside each question which architecture it actually needed. That column is the first draft of your retrieval roadmap.

Scale it down: run three questions against one document and score the nine answers the same way.

Chapter Summary

A retrieval step lets the product look facts up before it answers instead of leaning on training data that froze months or years ago.
One-shot retrieval searches first and hands the results to the model, which answers once. This is the classic RAG pipeline.
Agentic retrieval hands the model a search tool, and the model calls it, takes the results as new input, and searches again with a better query when the first pass misses.
One-shot is cheap, fast, predictable, and easy to evaluate. Agentic handles vague and multi-hop questions at more cost and less predictability.
Match the architecture to the question type and the stakes, and consider one-shot as the default with agentic as the escalation.
Skip retrieval when the facts fit in the prompt, when the knowledge never changes, or when the path cannot afford a search round trip.
Retrieval is a feature with three failure modes users can see: found nothing, found the wrong thing, and found too much. The wrong-thing failures are the dangerous ones because they look like good answers.
One-shot or agentic settles who runs the search; what kind of search to run is the next decision, and The search stack: match the retrieval method to the question makes it.

Sources

Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems 33.
Anthropic engineering blog (2025). How we built our multi-agent research system.
OpenAI file search documentation and Anthropic tool use documentation (last verified July 2026).

Marks this chapter complete on your course map. Reaching the end does this for you.