Thoughts on the 2025 OWASP Top 10 for LLMs release
Attention Conservation Notice A lightly-edited copy of a bluesky thread – originally here – collecting some thoughts about the OWASP Top 10 for LLMs.
The OWASP folks released a Top-10 for LLM and GenAI applications recently (https://genai.owasp.org). Unlike the last version, I didn’t contribute to this one, and I now wish I’d at least tried to help course-correct it; there’s some stuff in there that I think could be a lot better. All of this, of course, should be read as my opinion. I’ve been working in LLM security for a while, so I think it carries some weight, but it’s still just one guy’s opinion. Also: something is better than nothing; even though I have critiques, I’m still glad OWASP is working in this area.
Anyway, let’s start with the things I like:
- Prompt injection: this is in many ways the fundamental vulnerability of LLM applications. They’ve got it first in their list – as expected – and while I could pick at separating direct and indirect prompt injection, the bulk of that one is solid. The mitigations are reasonable, and the scenarios broadly reflect things I’ve seen in practice. This is the most important one to get right, and they basically do. Indirect prompt injection is more important in practice IMO than “direct” prompt injection, but whatever; it’s a good entry.
- Insecure Output Handling – I think this one is very good to be honest. There’s a bit of confusion with least-privilege issues, but the general thrust of “know what consumes LLM outputs and make sure that you sanitize or neutralize accordingly” is both correct and critically important.
- Next, I think Excessive Agency should really be renamed “Failure to Implement Least Privilege”, but the mitigations are good: they focus on the need for all tools/plugins to be narrowly scoped, implemented with least privilege, and have their actions evaluated against security policy.
- Misinformation, finally, is IMO not really strictly a security issue. It’s something that LLM users and developers need to be aware of and educate downstream consumers about. It needs to be called out. I don’t think a security risks list is the place to do it, but the content is IMO fine.
And unfortunately, now we get to the bits that I think could have used some more work. Again, I offer this in the spirit of constructive criticism, and as someone who decided not to be involved in helping with this version of the top 10. I wasn’t in the room, I didn’t see the sausage being made.
Sensitive Data Disclosure feels like it muddles several issues: training data leakage (there’s an active debate in the literature about how feasible this really is), information that users provide to the system, and the Proof Pudding attack (which doesn’t involve genAI?). I won’t go point by point on the recommendations, but they suggest that differential privacy might provide adversarial robustness, or federated learning might protect against completion-based data extraction. I’m not convinced; those attacks are not in the threat model for those techniques. They also talk about concealing the “preamble” to prevent “overriding” it (see LLM07:2025, though; system prompts are not secret). This one is a bit confusing to me: this feels like it’s more about mitigating prompt injection (where info leaks might be the goal of the PI attack)?
Finally, they suggest homomorphic encryption, which – I’m sorry – is the wrong tool for defending against the described threat. HE is not super practical yet (but it’s getting better, yes), but more importantly it defends against a different threat than the one described in this section. Homomorphic encryption protects you against an untrustworthy person serving the model from taking apart your model or snooping your queries. If someone can send prompts to it and get corresponding completions, then it does nothing to stop users from extracting data from it. I’m not going to to through the rest, but in general (again: IMO) it feels like there’s a disconnect between the threat description, the mitigations, and the scenarios. Training data, session data, and metadata (training algorithms etc.) are all kind of mixed together.
Supply Chain: This tries to cover a very broad scope: software supply chain risks, model serialization risks, legal risks, model supply chain risks, and data supply chain risks. The recommendations here are mostly table stakes and common sense, but there’s two that I want to call out: first, I don’t think anomaly detection on training data is actually super useful. It hasn’t been demonstrated at scale, as far as I know, and I’m skeptical that you could effectively use it to detect malicious data. Willing to be convinced though. Second, one of their scenarios is on the LeftoverLocals attack (allows recovery of session data from GPU memory). The rest of this section focuses on integrity, not confidentiality. I think this fits better in “Sensitive Information Disclosure”. Finally, while having a patching policy is always good, I’m not sure that it’s practical advice for models specifically – I don’t know of any good way to mitigate a model trained on bad data that isn’t “retrain from the jump and distribute new model weights.”
Data and Model Poisoning: this is not (IMO) adequately distinguished from Supply Chain. They also don’t mention what is (again, IMO) the single most important mitigation: put your training data under access control. It doesn’t matter if you clean it if someone can just contaminate it again. This is a significant omission. There’s other things I could call out: they talk about RAG to combat hallucinations (not clear how this relates to model poisoning), they aren’t very precise about the bias/poisoning distinction, and they propose to “Test model robustness with…adversarial techniques, such as federated learning” which confuses training time privacy controls with robustness tests (and see above w/r/t federated learning and adversarial robustness). But the big thing I keep coming back to is not recommending access control on training data as the first mitigation, just data versioning (which is also important, don’t get me wrong, but not nearly as important as RBAC IMO). This feels like a serious omission.
System Prompt Leakage: Long time listeners will know I have some opinions about this. To be fair to the OWASP folks, it’s a decent discussion under confusing name. The real risk is that people think that system prompts are secure. They aren’t. They are basically public. Just publish them. At the risk of being repetitive: You can’t put anything secret into the system prompt. You can’t put anything security sensitive into the system prompt. This has been shown repeatedly in both LLMs and multimodal systems. You simply cannot rely on privacy in the system prompt. I am happy that the OWASP folks acknowledge this up front, but I wish they’d be more full-throated in the recommendation that ALL security controls must be implemented outside of the LLM. That’s way down at #4 on their list of mitigations. It should be #1; 2 and 3 are ways to implement this.
Vector and Embedding Weaknesses is another one with (IMO) an unclear name. The way it’s presented, I think it could really could be broken up and distributed across Prompt Injection (indirect prompt injection), Excessive Agency (incorrect authorization), and Supply Chain. I do think that they somewhat confuse Indirect Prompt Injection and data poisoning. This is debatable, but (IMO) if you’re just adding junk data to the system so that it reproduces it, it’s poisoning; if you’re trying to directly and precisely control LLM output, it’s prompt injection.
Finally we get to Unbounded Consumption which basically says “if you allow unlimited queries bad things can happen”. They start by focusing on resource exhaustion attacks and driving up costs, but then pivot to model extraction and functional replication attacks. Which is fine, but the mitigations they propose (after the obvious like “rate limit”) don’t address “Unbounded Consumption”, they address the attacks that unbounded consumption enables (like model extraction); they talk about things like minimizing logits (common sense) or watermarking (doesn’t help against extraction, probably doesn’t work well in general) or the idea that adversarial training would somehow help defend against extraction attacks (has this been done? this seems dubious to me; or do they mean RLHF/instruction hierarchy techniques?). They also mention filtering glitch tokens. This is a good idea, but IMO it’s in the wrong entry: it should probably be in Prompt Injection. And then centralized model inventory - also in Supply Chain, probably should be in Model Poisoning - doesn’t seem to really address Unbounded Consumption?
Anyway, that’s from a first read through. I hope those critiques are helpful, and again, it’s always easier to critique than to create; I give full credit to the folks who worked on this for getting something out. Even just having an OWASP Top 10 for LLM/GenAI exist sends a good message.
To sum up, these are (IMO) good:
- LLM01: Prompt Injection
- LLM05: Improper Output Handling
- LLM06: Excessive Agency (think “Least Privilege” as you read it)
- LLM08: Vector + Embedding Weaknesses (think “RAG Security”)
- Skim LLM09: Misinformation (is it really “security”? 🤷♀️)
For the ones where I do see room for growth, it’s mostly harmonizing title to threats, and then mitigations to those titles. “System Prompt Leakage” is a good example of the former: the name is misleading with regards to the content (the risk isn’t the prompt leaking, it’s thinking it won’t). For the latter, Data + Model Poisoning doesn’t mention access control on data (only tracking), and then includes prompt injection as a data poisoning scenario. You can find examples of this throughout. I know this is an evolving field, but that makes clarity even more important.
Once more for emphasis: that this exists is a net good. Highlighting that LLMs/GenAI have novel security risks that people may not be aware of is important. But the next step is to give clear guidance and mitigations for those risks, and I think that’s where there’s room for this to grow.