Why attackers love AI products · The Builder's Stack

Write the Agent CharterMap your threats

You ship an AI assistant inside your product. The demo lands, the first users like it, and for two weeks nothing seems to happen. Then the cloud bill arrives at roughly forty times its usual size, and the usage dashboard shows the model answering questions all night for accounts that have never opened the app. There are no customers behind that traffic. Someone found your endpoint, noticed that it relays text to an expensive hosted model, and quietly routed their own workload through it. You thought you shipped a support feature; the stranger who found it saw a free compute account with your billing attached.

Your product did not earn this attention; the model behind it did. Adding AI feels like adding a feature, but from the outside it adds doors, and doors are the first thing attackers check.

It helps to be precise about what arrived when the feature went live. Your product gained a component that follows instructions wherever they appear, a set of tools that component can operate, a corpus of your data for it to work over, a set of keys it borrows to do its job, and a meter that charges you for every request. None of those existed in the product you ran a year ago, and every one of them is reachable by anyone who can get text in front of the model.

Attackers are early adopters by profession. They send the strange inputs your users would never think to type, they automate the sending, and they start within days of a launch.

Every AI feature is a new door into your product, and attackers try doors before users do.

The new doors, one by one

We walk the doors in the order an attacker usually meets them. Each gets a one-line incident here and a deeper treatment in its own chapter.

The instruction door. The model acts on whatever text reaches it, and it cannot reliably separate your instructions from a command hiding in a tweet, a résumé, or a review it was asked to summarize. In September 2022, the remote-jobs site remoteli.io ran a GPT-3 Twitter bot, and users hijacked it within days by tweeting "ignore all previous instructions" and dictating new ones, until the company pulled the bot offline. OWASP, whose top-ten list web teams have trusted for decades, now publishes a separate top ten for LLM applications with this failure at number one; the full treatment lives in Injection: the input is the attack surface.
The tool door. Give the model tools, and whoever steers the model operates the tools, from the database query to the code runner to the email sender. In June 2024, security researchers showed that Vanna.AI, a popular ask-your-database library, would take a crafted question, generate chart code from it, and execute that code on the server, turning one hostile prompt into remote code execution. The action surface: every tool is delegated authority covers what granting a tool really hands over.
The data door. The corpus you connected for context, the documents, messages, tickets, and history, is now reachable through conversation rather than through your permission screens alone. Researchers keep demonstrating the same pattern, instructions planted in shared content steering an assistant into handing over whatever it can read, and Data: what flows in and what leaks out dissects one public case end to end.
The key door. To do its work, the feature holds credentials: your model API key, a database login, tokens for the services it calls, sometimes the signed-in user's own permissions. Keys leak at industrial scale. In February 2025, researchers scanning a public web crawl used for model training found nearly twelve thousand live API keys and passwords sitting in page source.

The meter: attackers can spend your money

Classic software costs about the same to run whether a request is friendly or hostile, so a stranger hammering an old-style API was mostly a nuisance. An AI feature bills per token, which means every request has a price, and a hostile script gets to choose how big that price becomes: the longest allowed output, the maximum context, thousands of runs an hour, all on your account.

This attack now has a name. In May 2024, a cloud threat research team described LLMjacking, the practice of stealing access to someone else's hosted models and running workloads on it. The attackers in that report got in through cloud credentials stolen via a vulnerable web framework, used a script to check which model APIs the stolen keys could reach, and resold the access to strangers. The victim pays the bill, and the researchers put the potential cost at about forty-six thousand dollars a day.

So the security conversation cannot stop at embarrassing outputs. A jailbroken chatbot costs you a screenshot and an apology, while a hijacked meter costs you money at a rate the attacker picks.

You now defend two things at once: what the product can say and what it can spend.

You can leak users' data with no attacker involved

The other opening-scene nightmare, a stranger's data showing up in an answer, does not even require someone hostile, and the field's flagship product demonstrated that first. In March 2023, a caching bug let some ChatGPT users see the titles of other users' conversations, and for a few hours it exposed limited billing details for a small share of paying subscribers. OpenAI took the service down and published the cause: a bug in an ordinary open-source library.

That failure involved no clever prompt. It was an ordinary infrastructure mistake, a cache shared too widely, the kind every fast-moving team makes. What changed is what leaks: conversations and keys, the most intimate data a product can hold, so an ordinary bug now exposes extraordinary things. Much of what protects an AI product is still ordinary security, done well.

What the rest of this part hands you

This chapter's only job was to show you the doors; the rest of the part defends them one at a time. Threat-model your AI feature turns the tour into a ranked list of what could actually happen to your feature. Identity: whose keys your AI holds deals with the borrowed keys, and Data: what flows in and what leaks out deals with the corpus, the logs, and what travels between them. The supply chain you didn't build covers the models, libraries, and tool servers you imported from strangers. Defense in layers: what the prompt cannot stop assembles the controls into an architecture, Red-team your product before strangers do makes you your product's first attacker, and Write your Security Posture and ship defended closes the part with a document your team can ship against.

Try it now

Run this on your own product today; it takes about fifteen minutes and produces the inventory the next chapter starts from.

List every place text enters. Write down every path by which outside text reaches your model: the chat box, uploaded files, retrieved documents, fetched web pages, support tickets, meeting transcripts, email. Include text the user never typed, because the model handles all of it the same way. If the feature has a codebase, open it in Claude Code and ask for every route where external text reaches a model call.
List every key the feature holds. The model API key, database credentials, third-party tokens, any permissions borrowed from the signed-in user. Note where each one lives and what it can reach.
Price one hostile hour. Take your cost per request at the longest allowed output, multiply by what a simple script could send in an hour, and write down which alert, if any, would tell you it was happening.
Mark what you have actually tested. For each entry point, mark whether anyone has ever deliberately sent it hostile input: an instruction hidden in a document, an oversized request, a request for another user's data. The unmarked rows are your homework for the rest of this part.

Chapter Summary

Adding AI added doors: a component that follows instructions found in data, tools it can operate, a corpus it can read, keys it borrows, and a meter that bills per use.
Attackers probe new features before ordinary users do, automatically and at machine speed.
Any text the model processes can carry commands, which is how a Twitter bot ended up taking dictation from strangers.
Tools turn a hostile prompt into a hostile action, up to and including code running on your server.
Connected data becomes reachable through conversation, not just through your permission screens.
Your meter is a target in itself: attackers steal model access, resell it, and leave the victim a bill that can reach tens of thousands of dollars a day.
You can leak user data through ordinary infrastructure bugs with no attacker involved, as even the field's flagship product did.
Clean traffic from friendly users is not evidence of safety; it only means nobody hostile has arrived yet.
Start defending with Threat-model your AI feature, which turns this chapter's doors into a ranked plan.

Sources

Sysdig Threat Research, disclosure of LLMjacking: stolen cloud credentials used to validate and resell access to hosted models, with potential victim costs around $46,000 per day (May 2024).
OpenAI, incident report on the ChatGPT outage caused by a caching-library bug that exposed other users' chat titles and limited billing details for a small share of subscribers (March 2023).
AI Incident Database and press reporting on the prompt-injection hijacking of the remoteli.io Twitter bot (September 2022).
JFrog Security Research, prompt-injection-to-code-execution vulnerability in the Vanna.AI library, CVE-2024-5565 (June 2024).
Truffle Security, scan of a public Common Crawl archive finding nearly 12,000 live API keys and passwords (February 2025).
OWASP, Top 10 for Large Language Model Applications (2023; updated 2025).

Marks this chapter complete on your course map. Reaching the end does this for you.