The tool interface is the storefront: design actions agents choose · The Builder's Stack

The user without eyesGet found by agents

The integration ships after a customer's platform team asks for it: your product's core actions, exposed as tools any agent client can call. Engineering lifts the descriptions from the internal API reference, and QA confirms every call works. A month later you read the session logs: hundreds of clients fetched your catalog, and most never called anything. The ones that did picked manage_records, your catch-all tool, passed a free-text date where the endpoint expected ISO 8601, received a bare 400, and stopped. One log stings more than the rest: an assistant with a request your product handles best listed your tools, then finished the job with a competitor whose description stated the capability in one sentence. Every call worked in QA, and nobody tested whether an agent would choose to make one.

Each of the three waves of agent traffic eventually arrives at the same moment: an agent is handed a job and a catalog of tools, and one tool gets the call. This chapter is about winning that moment, because the names, descriptions, schemas, and error messages your team treated as internal API hygiene now do the persuasion your homepage used to do.

Agents pick tools by reading the catalog

A person's impression of your product comes from many places: a homepage, a demo, a colleague's recommendation. When an agent selects a tool for a job, the whole impression comes from the catalog. A model client with a dozen connected tools receives a request, the runtime hands the model that request along with every tool's name, description, and parameter schema, and the model produces a call to one tool or no call at all. Nothing else about your product exists at that moment: no brand, no design, no pricing page, only the text in the definitions.

When an agent compares tools for a job, the tool catalog is the entire storefront: the name, the description, and the schema do the persuasion your homepage used to do.

The best-described tool wins the call, and the vaguest never gets invoked, because a description that could cover anything gives the model nothing to match against. Function-calling benchmarks score models on this behavior, including whether they decline tools that fit no request, and the major providers publish tool-writing guidance that reads like an ad-copy manual for machines: state the capability, be specific about limits, and show an example. Tool selection is a text-matching contest, and your definition is the only text in it you control.

One catalog in front of every client

Exposing your product as a server under the Model Context Protocol puts that catalog in front of every client that speaks the protocol, from chat assistants to coding tools to desktop clients, without a per-vendor integration for each. In every one of those clients, the listing copy is the tool descriptions themselves. Adoption figures move too fast to print here (the dated Interop Ledger keeps them current); the durable point is that one well-written catalog now reaches an ecosystem, while a vague one underperforms everywhere at once.

Write the listing: capability, constraints, and when not to use it

Each part of a definition has persuasion work to do.

Name the action as a verb and an object. Names like create_invoice, search_orders, and cancel_subscription state the job in the form a request arrives in; manage_records pushes all the matching work onto the description.
Lead the description with the capability. Open with one sentence a buyer could act on: "Creates a draft invoice for an existing customer and returns its id and a payable link."
State the constraints and the costs. Cover what it handles, what it does not, hard limits ("returns up to 100 results"), and a latency hint ("responds in seconds", "starts a job that runs for minutes") that ends up in the plan the client produces.
Say when not to use it. One redirect line routes the near-miss: "For refunds on paid invoices, use refund_invoice instead."

Hold each tool to one job. A kitchen-sink tool reads vague because it is vague, and vague listings lose the contest above.

Design the schema so the easy call is the safe call

After the pick, the schema governs whether the call succeeds, and the design question for every parameter is where its value will come from: the request carries some, defaults can carry others, and the model fills whatever remains on its own.

Require only what the job cannot proceed without. Every extra required field is a field the model must fill even when the request never mentioned it.
Prefer enums to free text. A status parameter that accepts one of open, paid, or void turns invention into selection, and the schema itself carries the valid values.
Set defaults that make the minimal call safe. Conservative limits, sensible sort orders, and a dry-run flag that defaults to on for destructive actions all protect the short call.
Give every parameter a format example in its description. "ISO 8601 date, for example 2026-07-02" prevents the free-text date from the opening scene.

A parameter the model has to guess is a parameter that gets guessed wrong. Required fields the request does not carry, free text where a fixed list would do, and missing defaults all invite invented arguments.

Return errors an agent can act on

The last part of the storefront is what happens when a call fails, and it matters more with agents than it did with developers, because the developer who read your docs at integration time is not present at run time. An agent retries, and everything it has to correct with is the text your error returns.

A bare 400, or a body that says only "invalid request", offers nothing to correct against, so the retry repeats the mistake or the task ends, and for a shopper mid-comparison the sale ends with it.
A structured error states what was wrong and what to try next: "invalid start_date: expected ISO 8601, for example 2026-07-02; received 'early next month'." When a precondition is unmet, the error names the missing step: "customer not found: call search_customers first to get a customer_id."

A structured error is a second chance at the task: state what was wrong and what to try next, and the agent corrects course on the next call; a bare status code ends the task there.

Every tool you list is authority you hand a guest

Everything above makes your tools easier to choose, and every tool an agent can choose is also authority you have handed a guest: a well-described create_invoice invites calls from every client that connects, including the confused and the compromised. The action surface: every tool is delegated authority covers that half, scoping what each tool may touch and how much damage a bad call can do. Review the two halves together: one definition reads as a listing to the buyer and a permission slip from you.

Try it now

This drill takes about 25 minutes and tells you whether an agent picks your tools for the jobs they exist to do.

Pick your three. Choose the three most valuable actions in your product, or in a product you know well: the ones a customer's assistant would complete on their behalf.

Write the listings. Give each action a verb_object name and a description that leads with the capability, states constraints and a latency hint, and ends with when not to use it.

Add the schema and two errors. Define required and optional parameters with enums and defaults where they fit, then write two error messages per tool, one for a bad argument and one for an unmet precondition, each stating what was wrong and what to try next.

Plant the decoy. Write a fourth definition that is deliberately vague: a generic name, a one-line description that could cover anything, and a single free-text parameter.

Run the choice test. Hand a model client all four definitions and five realistic user requests, and ask which tool it would call for each, with what arguments. Scale it down: no live server is needed; pasting the definitions and requests into any chat assistant as plain text runs the same selection.

Score and rewrite. Count how often the right tool wins with valid arguments; every miss marks a flaw in a definition, not in the model, so wherever the decoy won or an argument was invented, rewrite the losing listing and rerun until it goes five for five.

Chapter Summary

When an agent picks a tool for a job, it reads names, descriptions, and schemas; nothing else about your product is present at that moment.
The best-described tool wins the call, and the vaguest never gets invoked.
Name each tool as a verb and an object, and hold each tool to one job.
Lead the description with the capability, then state constraints, a latency hint, and when not to use the tool.
Merge near-duplicate tools, because overlapping listings split the choice and make behavior unreproducible.
Require only the parameters the job cannot proceed without, prefer enums to free text, and set safe defaults; a value the model has to guess gets guessed wrong.
Return errors that state what was wrong and what to try next, since a bare status code ends the task and, for a shopper, the sale.
One catalog exposed over the shared protocol reaches every client that speaks it, and the descriptions are the listing copy.
Every tool that persuades is also authority handed to a guest, so review the listing and the permission together.
A strong catalog works only on the agents that reach it, and Discoverability: help agents find you and choose you covers getting found in the first place.

Sources

Anthropic (2025). Writing effective tools for agents. Anthropic engineering blog.
Anthropic. Tool use documentation and tool definition best practices (last verified July 2026).
OpenAI. Function calling guide and schema design guidance (last verified July 2026).
Model Context Protocol specification, tools section (last verified July 2026).
UC Berkeley Gorilla project (2024). Berkeley Function-Calling Leaderboard.

Marks this chapter complete on your course map. Reaching the end does this for you.