LLM-Specific Attacks
Prompt injection (direct & indirect), jailbreaking, prompt leaking, training data extraction, hallucination exploitation, agent manipulation.
What is LLM-Specific Attacks?
Large Language Model (LLM) security is the fastest-evolving attack surface in cybersecurity. As organizations rush to deploy chatbots, copilots, and AI agents, they're exposing themselves to an entirely new class of vulnerabilities that traditional security tools can't detect — prompt injection, jailbreaking, training data extraction, and agent manipulation.
Prompt injection is particularly dangerous because it turns the AI's own capabilities against the system. A carefully crafted input can make an LLM ignore its instructions, leak system prompts, access unauthorized data through tool use, or execute actions the user shouldn't be able to trigger. Unlike SQL injection where the payload syntax is well-defined, prompt injection payloads are expressed in natural language — making them harder to filter and detect.
The OWASP Top 10 for LLM Applications has become the definitive framework for understanding these risks, and the field is seeing rapid development of defensive techniques including guardrails, output filtering, prompt firewalls, and multi-model safety architectures.
Why it matters
Every organization deploying LLMs is exposing a new attack surface. Understanding LLM-specific attacks is no longer optional — it's as fundamental as understanding web application security was a decade ago.
LLM attacks represent the intersection of traditional application security and AI safety. As AI systems gain more capabilities (tool use, code execution, data access), the blast radius of successful prompt injection grows exponentially.
AI & Quantum Futures
The emerging stack reshaping cybersecurity from both directions — AI toolkit, AI attack surface, and the quantum transition.
Other domains in this layer
Key topics
People shaping this field
Researchers and practitioners worth following in this space.
Researcher on prompt injection and LLM security
Researcher on indirect prompt injection
Google DeepMind researcher on adversarial ML and LLM attacks
Curated resources
Authoritative sources we ground LLM-Specific Attacks questions in — frameworks, research, guides, and tools.
OWASP Top 10 for LLM Applications 2025
The definitive security risk list for LLM-powered applications. Covers prompt injection, insecure output handling, training data poisoning, and more.
MITRE ATLAS
Adversarial Threat Landscape for AI Systems. ATT&CK-style knowledge base of adversarial ML techniques, tactics, and real-world case studies.
Extracting Training Data from Large Language Models (Carlini et al. 2021)
Demonstrated that LLMs memorize and can be prompted to regurgitate training data verbatim, including PII. Foundational work on LLM privacy risks.
Universal and Transferable Adversarial Attacks on Aligned Language Models (Zou et al. 2023)
The GCG attack paper. Showed that adversarial suffixes can bypass safety alignment in LLMs, transferring across models.
Not What You've Signed Up For: Compromising RAG with Indirect Prompt Injection (Greshake et al. 2023)
Demonstrated indirect prompt injection attacks through RAG documents, emails, and web content. Essential reading for RAG security.
Many-Shot Jailbreaking (Anthropic 2024)
Demonstrated that long-context LLMs can be jailbroken by providing many examples of the desired behavior. Scales with context window size.
Crescendo: Multi-Turn LLM Jailbreak Attack (Microsoft 2024)
Showed that gradually escalating benign conversations can bypass safety filters over multiple turns. Defeats per-message safety checks.
Anthropic Research Index
Collection of Anthropic's published research on AI safety, alignment, interpretability, and security.
Black Hat / DEF CON Archives
Conference presentations covering novel attack techniques and defensive research. Essential for cutting-edge offensive/defensive questions. AI Village talks particularly relevant for Pillars B and C.
NIST AI 600-1 — AI RMF Generative AI Profile
Companion to AI RMF 1.0 specifically for generative AI. Maps 12 GenAI risks to RMF actions. Covers CBRN, CSAM, confabulation, data privacy, environmental, human-AI interaction, information integrity, IP, obscenity, toxicity, value chain.
OWASP — "Top 10 for LLM Applications: Agentic Applications" (2025 supplement)
Extension of the LLM Top 10 specifically for agentic patterns. Covers excessive agency, insecure plugin/tool design, and multi-agent trust boundaries.
Fang et al. — "LLM Agents Can Autonomously Hack Websites" (2024)
Demonstrated GPT-4 exploiting real-world web vulnerabilities autonomously. 73% success rate on day-one CVEs. Key reference for questions about AI-augmented offensive capabilities and the asymmetry debate.
Roles where this matters
Career paths where this domain shows up as core or recommended.
Secure AI/ML systems from adversarial attacks, data poisoning, and model compromise. The fastest-growing specialization in cybersecurity.
The policy/controls counterpart to the AI Security Engineer — owns risk frameworks, regulatory mapping (EU AI Act, NIST AI RMF), model documentation, and AI incident response policy.
Secures the platform that trains, stores, and serves ML models — multi-tenant GPU isolation, pipeline integrity, feature-store hygiene, secrets management in ML workflows.
Embedded in a product team — owns threat modelling, secure design, libraries, dependency risk, and increasingly the AI-specific hardening of LLM features the product ships.
Certifications that signal this domain
Credentials whose blueprint meaningfully covers this domain. Core means centrally covered; also touched means present in the blueprint but not the primary focus.
Core coverage
Certified Offensive AI Security Professional
EC-Council certification for offensive AI security. Focus on Prompt Injection, Model Extraction, Training Data Poisoning, Agent Hijacking, LLM Jailbreaking. Aligned with OWASP LLM Top 10, NIST AI RMF, ISO 42001. Brand new since February 2026.
GIAC AI Security Automation Engineer
GIAC certification for AI Security Automation. Focus on agentic workflows, automated adversary emulation, AI-enabled response playbooks. Launched April 2026 — brand new.
OffSec AI Security Practitioner
Offensive AI security — adversarial ML, LLM attacks, agent abuse.
CompTIA Security AI+
SecAI+ is CompTIA's answer to the need for certified professionals who combine classic cybersecurity skills with AI-specific security knowledge – officially launched in February 2026. As an 'Expansion Cert,' it is explicitly designed as a complement to existing credentials such as Security+, CySA+, or PenTest+ and targets practitioners who must secure AI systems and defend against AI-enabled attacks. Its strength lies in the practice-oriented domain structure (40% Securing AI Systems) and strong regulatory alignment story around EU AI Act and US Executive Order on AI. Weakness: The certification is only a few weeks old; job postings rarely demand it explicitly, and the market for learning materials is still thin. No hands-on labs in the exam – adversarial ML topics are tested conceptually, not practically.
Browse all certifications → — pick a cert on the interactive map to highlight every domain it covers.
More in Cybersecurity of AI Systems
See how your LLM-Specific Attacks skills stack up
306 questions available. Compete head-to-head or run a quick speed quiz to benchmark yourself.