Skip to content
AI-Native PM
7 min · 0 of 8 in When Your Users Are Agents

Computer-use traffic: when agents operate your UI anyway

A customer writes in after a failed renewal: their assistant tried to update the payment card and reported that your product does not support the change. You pull the session replay expecting a bug and instead watch minutes of patient, alien navigation. The cursor travels in straight lines and tries the icon-only buttons in your settings header one at a time. On the billing page, the control that edits a card is a bare pencil glyph with no label, and the cursor clicks past it again and again. The one change it completes is confirmed only by a toast that vanishes before the next screenshot, so it retries the step twice. Nothing errored and your logs look normal, but the assistant reported the task impossible, and the customer believed it.

The earlier chapters in this part built the tool lane: tool definitions agents choose, delegation an agent can carry, and guardrails for guest sessions. This chapter covers the agent traffic that uses none of it. Computer-use agents drive the same interface humans do, clicking and typing through what you shipped, and the choice left to you is whether that interface fights them or works for them.

The browser is the lane you cannot close

A person's assistant works wherever the person works. Told to cancel a subscription or move a booking, it goes where the person would have gone, and for most of the world's software that is a browser tab, because most products expose no tool interface at all.

The capability went mainstream quickly: one major lab shipped computer use in late 2024, the others followed within months, and operator-style products now run entire checkouts this way, pausing only at payment for approval. The lineup changes too fast to keep in a chapter, so the current list lives in our dated Interop Ledger; the stable fact is that the web interface you already ship is now an agent interface.

You cannot opt out of agent traffic by declining to ship an API. A computer-use agent needs what a human needs, a screen, a pointer, and a login its principal handed it, and your product already provides all three.

Every product therefore runs two lanes for the same job: the tool lane takes a structured request and returns a structured result in one round trip, while the UI lane runs the human loop, screenshot, decide, act, screenshot again, across dozens of steps. Agents take whichever lane exists and works.

What breaks an operator, and what helps

An operator works from two channels, the rendered screenshot and the accessibility tree, so anything hidden from both is hidden from the agent. Watch enough replays and the same snags repeat.

  • Novelty CAPTCHAs. A puzzle built so no machine can pass it stops a customer's delegated assistant as hard as the scraper it was aimed at.
  • Cookie and consent walls. Runs usually start in a fresh profile, so the banner gauntlet gets paid on every visit, and a banner that traps focus ends the run before it starts.
  • Moving elements. Layouts that shift as content loads and carousels that rotate mid-decision produce clicks aimed at where a button was one screenshot ago.
  • Unlabeled icon buttons. A bare glyph with no accessible name leaves the model only pixels to work from.
  • Canvas-rendered UI. A drawn interface exposes no elements; the accessibility tree comes back empty and everything must be inferred from pixels.
  • State that lives only in a toast. A confirmation that vanishes in seconds is gone before the next screenshot, and an action the agent cannot verify gets retried, which is how one cancellation becomes two.

The fixes are not exotic: semantic HTML with real buttons and labeled form fields, accessible names that stay stable across releases, a focus order that follows the visual flow, and state that stays visible on the page. This is the accessibility work you already owed human users, and annual audits of the most-visited home pages still find detectable failures on nearly all of them, so most products enter this era failing both audiences at once.

Agent operability and screen-reader operability are nearly the same checklist. An interface that tells assistive technology what each control is and what just changed tells an operator the same things.

Friction is a design decision

Once the accidental snags are gone, the question flips: where should an operator have to stop? An agent that can glide through your settings can also glide through workspace deletion its principal never asked for. Irreversible steps deserve friction placed deliberately: a confirmation stating the consequence in plain text ("this permanently deletes 14 projects") forces a well-built agent to carry the decision back to its principal, and slows a badly built one long enough to matter.

Deliberate friction is a checkpoint; accidental friction is a lost task. Put a plain confirmation in front of every irreversible step, and take every other snag out, because those protect nothing and lose you tasks you wanted completed.

Decide the policy before you ship the detector

UI-driving agents can be detected, imperfectly. The tells are behavioral: cursor paths with no scroll physics, uniform timing between actions, no reading pauses. None of it is proof, the signals fade as the tools add human-like jitter, and any detector also flags some humans on assistive tooling, because the checklist overlap cuts both ways. Detection is worth having only after you choose what a positive triggers:

  • Welcome: let identified operators run, and keep the flow stable for them.
  • Redirect: tell the operator a better lane exists and where it starts.
  • Block: turn the session away, and accept that the customer hears "the product failed."

A detector shipped before that decision defaults to the worst outcome: a challenge page nobody chose, a task failed silently, and a customer who hears only that your product did not work. The generous response is to redirect: when the signals say operator, put a machine-readable note in the page itself saying an API exists for this task and where its documentation lives. The agent's next run takes the cheap lane, and how to make that lane worth choosing is the subject of The tool interface is the storefront: design actions agents choose.

A UI run costs far more than the tool call it replaces

The economics push in the same direction as the design. A UI run spends a model call per step, and every call carries a screenshot, so a task one tool call could finish burns an order of magnitude more compute, while your side serves dozens of page loads for an outcome one endpoint returns. Most of that spend lands on whoever operates the agent, which is still your problem: agents re-run vendor comparisons without loyalty, and dozens of fragile steps lose to a single reliable call.

This is the per-task arithmetic of The bill of materials: cost the task, not the call running on a task you were never paid to serve, so make the cheap lane discoverable from inside the expensive one: link the API documentation from the pages operators visit most, and answer detected operators with that pointer rather than a challenge.

Try it now

This drill takes 20 to 30 minutes and produces a scored snag list for one flow you care about.

Pick the flow and stage it. Choose one real flow with a clear finish line, signup, a settings change, or a checkout, and run it in a test account so nothing irreversible is at stake.

Run the operator. Hand a computer-use agent (any major provider's, or an operator-style browser product) a one-line goal, such as "change the plan and stop before payment," and let it work without help. Scale it down: one short flow, one cheap model, screenshots reviewed by hand; you are collecting stalls, not statistics.

Score every stall. From the log or replay, record each place the agent clicked the wrong element, looped, or gave up, and note what was on screen: an unlabeled icon, a vanished toast, a banner, an element that moved.

Rank by the second audience. Take the top three snags and mark whether a person using a screen reader would hit the same one: an unlabeled icon fails both, a novelty puzzle mostly fails the agent. The doubles go to the top of the backlog, because each fix serves two kinds of user at once.

Chapter Summary

  • Some agent traffic never calls an API: computer-use agents operate the same interface humans do, and shipping no tool interface routes more traffic into the browser lane, not less.
  • The person's assistant works wherever the person works, every major model provider now ships the capability, and operator-style products already run entire checkouts this way.
  • Operators break on novelty CAPTCHAs, cookie walls, moving elements, unlabeled icon buttons, canvas-rendered UI, and state that lives only in a vanishing toast.
  • What helps them is the accessibility work you already owed humans: semantic HTML, stable accessible names, predictable focus order, and visible persistent state.
  • Operators break silently and never file tickets, so treat accessible names and flow order as a contract and test releases with one scripted agent run.
  • Put deliberate friction in front of irreversible steps so an agent has to carry the decision back to its principal, and remove accidental friction everywhere else.
  • Detection is imperfect and useful only with a policy chosen first: welcome, redirect, or block, never a detector with no decision behind it.
  • The generous response is to redirect: tell detected operators that an API exists, and make the cheap lane discoverable from inside the expensive one.
  • Next comes the money side of an agent that completes the purchase: The business when a bot buys: pricing, attribution, brand.

Sources

  • Anthropic (2024). Introducing computer use, announcement and developer documentation (last verified July 2026).
  • OpenAI (2025). Introducing Operator, announcement, and ChatGPT agent announcement.
  • Google DeepMind (2024). Project Mariner announcement.
  • WebAIM (2025). The WebAIM Million: an annual accessibility analysis of the top one million home pages.
  • W3C (2023). Web Content Accessibility Guidelines (WCAG) 2.2, W3C Recommendation (last verified July 2026).
Marks this chapter complete on your course map. Reaching the end does this for you.