Guardrails for guests: rate limits, abuse, and the malicious agent · The Builder's Stack

Whose authority it carriesAgents at the UI

The abuse alert fires late one night: a single API key is walking your invoice endpoint through sequential IDs at ten requests a second, retrying every 403 with a new parameter. You revoke the key, tighten the bot rules across the whole edge, and the graph goes quiet. Within days a churn warning lands on one of your largest accounts. Their operations team had wired an assistant to pull usage reports through your API, and your new rules served it a CAPTCHA it could not pass. The assistant reported that your product failed, then finished the job through a competitor's integration. Your door missed the guest it needed to stop, then stopped the guest it needed to serve.

Agent traffic needs a deliberate front-door policy, because failure on either edge is expensive: no gate at all invites probing at machine speed, and defenses tuned for scrapers silently turn away paying customers' assistants. This chapter builds that policy.

Tell your guests apart: proof comes in tiers

Agent traffic runs from anonymous crawlers that owe you nothing to agents acting on a paying customer's delegated authority, and no single policy serves that range. Every rule you write starts from the same question: who sent this request, and how strongly can they prove it?

Registered agents with credentials. The operator registers and receives keys or an OAuth client, so every request arrives tied to a scope you granted and a relationship you can bill, suspend, or call. Only this tier should reach actions with consequences.
Signed requests. In the emerging web bot auth pattern, an operator publishes a public key and signs each request, so the signature proves cryptographically which operator sent the traffic, with no account on file. Large edge networks already verify these signatures, creating a middle tier: a guest with a verifiable identity and a reputation to lose. Which operators sign and which networks verify keeps changing, so we track the specifics, with dates, in the Interop Ledger.
User-agent declarations. A self-applied label in a header, copyable by anyone in one line of code, is useful as a routing hint and a courtesy, never as proof.

Match the proof to the privilege: public reads for anonymous guests, better limits for signed operators, real actions only for credentialed agents, and never a decision worth money on a user-agent string alone.

Deciding which capability sits behind which tier is product work, not infrastructure work. Documentation stays open to everyone, signed crawlers you want to keep earn higher read limits, and every tool that changes state goes behind credentials, which is the case The action surface: every tool is delegated authority makes in full.

Rate limits tuned for agents, not people

A person reads, pauses, and clicks; an agent fans one errand into dozens of parallel calls, retries every failure immediately, and reruns the job whenever its customer asks again. None of that is abuse, and limits tuned to human pacing will classify it as an attack every time.

Quota per key and per organization. One customer can run many agents on many keys, so limit at both levels: a per-key ceiling stops a runaway loop without punishing the account, and a per-organization budget keeps one fleet from degrading the platform for everyone else. Per-IP rules mean little, since agent traffic arrives from shared cloud addresses.
Honest 429s with a Retry-After header. A well-built agent backs off to the second you specify and finishes late, while a silent drop gives its retry logic nothing to act on, so one over-limit request becomes a storm.
Budgets that degrade before they block. Under pressure, slow a guest before you stop one: lower queue priority, smaller pages, cached reads. A degraded answer keeps the customer's task alive; a hard wall converts your capacity problem into their outage.

An agent that hits a clean limit backs off and finishes late; an agent that hits a CAPTCHA cannot pass it and reports failure to a paying customer, so challenges belong only where you have decided no agent is welcome.

Refuse in words the agent can relay

The customer never sees your block page; they hear whatever their agent relays, and an agent handed a bare 403 or an unexplained challenge has nothing to pass along except failure. That is how the churn warning in the opening scene got written. Give every refusal a machine-readable reason and a next step: rate limited, retry after this many seconds; this action requires human confirmation, send the account owner here; this key lacks the required scope, and here is its name. The agent relays the reason, and the customer hears an accurate account instead of a dead end.

The customer only hears what their agent relays: a refusal with a machine-readable reason comes back as an accurate report, a bare block as "the product did not work."

The malicious guest probes at machine speed

Some guests arrive to test you, running the playbook of a patient human attacker compressed from weeks into an afternoon: malformed values fed to your tool parameters to see what the error messages leak, identifiers walked in sequence to find objects out of reach, every tool called with a low-privilege key to check whether its scope is enforced or decorative. The defenses that carry the load are familiar ones: treat every tool parameter as untrusted input, validated for type, range, and ownership, and enforce authorization on the server for every call, checking this key, this object, and this action on each request rather than once at session start.

Your content gets read into other people's agents

Everything so far protects your product from its guests; the last risk runs the other way, because your docs, listings, reviews, and support threads get fetched into other people's assistants as context. A hostile instruction planted in a review only needs to be read, and researchers have repeatedly pulled assistants off task with instructions embedded in ordinary web content.

You are now both a target and a carrier: hostile agents attack the tools you expose, and hostile text you host can ride into a customer's assistant as an instruction.

No filter reliably catches instruction-like text, so the moves that actually work reduce exposure.

Sanitize what you serve to machines. Strip markup, hidden text, and obvious instruction patterns from user-generated content in any feed built for machine readers.
Label whose voice is whose. Serve your text and your users' text in separate, labeled fields, so a reading client can tell whose words are whose.
Moderate the overlooked fields. Display names, review titles, and image captions reach context windows too, and moderation that only covers review bodies leaves them wide open.

Our chapter When inputs attack explains why injected text can steer an agent at all; this chapter adds the serving side.

Try it now

This drill red-teams your own front door in 15 to 30 minutes, and both halves run against your own product, never someone else's.

Pick your door. Choose one public endpoint or tool your product exposes: a docs page, a public API route, or a tool already open to agent clients.

Knock without a badge. From a terminal, run a small unidentified loop against it, a few dozen requests with no credentials and a generic user agent, small enough not to page your own on-call, and record what comes back: a silent block, a CAPTCHA, an honest 429 with a Retry-After header, or nothing at all.

Read one page like an agent. Open one page of your user-generated or public content and read it as raw text the way it enters a context window, display names and titles included, noting anything that reads as an instruction and the fields your moderation does not cover.

Write up the two worst findings. One from the door, one from the content, each with its fix: the honest 429 to add, the challenge to remove from a delegated path, the field to sanitize or label. Hand the list to whoever owns the edge; both fixes are cheap, and neither happens until someone asks.

Chapter Summary

Agent guests range from anonymous crawlers to agents carrying a customer's delegated authority, and one rule for the whole crowd fails at both edges.
Proof comes in tiers: registered credentials, cryptographically signed requests, and user-agent declarations, which are labels anyone can copy and never proof.
Match the proof to the privilege: public reads for anonymous guests, better limits for signed operators, actions with consequences only behind credentials.
Agents burst, parallelize, and retry, so quota per key and per organization, honest 429s with Retry-After, and budgets that degrade gracefully beat hard walls.
A CAPTCHA is not a limit: an agent cannot pass it, so the customer hears that your product failed.
Every refusal needs a machine-readable reason the agent can relay, because the customer only hears their agent's version of events.
Treat every tool parameter as untrusted input and authorize every call on the server; a well-behaved client is a convenience, not a control.
Your docs, listings, and reviews get read into other people's agents, so sanitize and label the text you serve; you are both a target and a possible carrier.
Some agents skip your API entirely and drive the interface you built for people, which is where Computer-use traffic: when agents operate your UI anyway picks up.

Sources

IETF (2025). Web bot auth draft specifications: HTTP message signatures for identifying automated agents (last verified July 2026).
Cloudflare (2025). Blog and developer documentation on verified bots and cryptographically signed agent requests (last verified July 2026).
IETF. RFC 6585, Additional HTTP Status Codes (2012), defining 429 Too Many Requests; RFC 9110, HTTP Semantics (2022), defining the Retry-After header.
Greshake, K., et al. (2023). Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. AISec.
OpenAI and Anthropic (2025). Published documentation of crawler and agent user agents and IP ranges (last verified July 2026).

Marks this chapter complete on your course map. Reaching the end does this for you.