Pillar C: Cybersecurity of AI SystemsC2

LLM-Specific Attacks

Prompt injection (direct & indirect), jailbreaking, prompt leaking, training data extraction, hallucination exploitation, agent manipulation.

Part of Pillar C: Cybersecurity of AI Systems · Cybersecurity of AI Systems groups the disciplines that share methods, tools, and threat models with LLM-Specific Attacks.

What is LLM-Specific Attacks?

Large Language Model (LLM) security is the fastest-evolving attack surface in cybersecurity. As organizations rush to deploy chatbots, copilots, and AI agents, they're exposing themselves to an entirely new class of vulnerabilities that traditional security tools can't detect — prompt injection, jailbreaking, training data extraction, and agent manipulation.

Prompt injection is particularly dangerous because it turns the AI's own capabilities against the system. A carefully crafted input can make an LLM ignore its instructions, leak system prompts, access unauthorized data through tool use, or execute actions the user shouldn't be able to trigger. Unlike SQL injection where the payload syntax is well-defined, prompt injection payloads are expressed in natural language — making them harder to filter and detect.

The OWASP Top 10 for LLM Applications has become the definitive framework for understanding these risks, and the field is seeing rapid development of defensive techniques including guardrails, output filtering, prompt firewalls, and multi-model safety architectures.

Why it matters

Every organization deploying LLMs is exposing a new attack surface. Understanding LLM-specific attacks is no longer optional — it's as fundamental as understanding web application security was a decade ago.

LLM attacks represent the intersection of traditional application security and AI safety. As AI systems gain more capabilities (tool use, code execution, data access), the blast radius of successful prompt injection grows exponentially.

Key topics

Direct prompt injection (user-facing attacks)
Indirect prompt injection (via documents, emails, web content)
Jailbreaking techniques (many-shot, crescendo, DAN)
Training data extraction and memorization
System prompt leaking
Agent and tool-use manipulation
Hallucination exploitation
Output handling vulnerabilities
Multi-turn conversational attacks
Prompt firewalls and defensive architectures
OWASP Top 10 for LLM Applications

People shaping this field

Researchers and practitioners worth following in this space.

Researcher on prompt injection and LLM security

Researcher on indirect prompt injection

Google DeepMind researcher on adversarial ML and LLM attacks

Curated resources

Authoritative sources we ground LLM-Specific Attacks questions in — frameworks, research, guides, and tools.

OWASPframework

OWASP Top 10 for LLM Applications 2025

The definitive security risk list for LLM-powered applications. Covers prompt injection, insecure output handling, training data poisoning, and more.

MITREframework

MITRE ATLAS

Adversarial Threat Landscape for AI Systems. ATT&CK-style knowledge base of adversarial ML techniques, tactics, and real-world case studies.

Academicresearch

Extracting Training Data from Large Language Models (Carlini et al. 2021)

Demonstrated that LLMs memorize and can be prompted to regurgitate training data verbatim, including PII. Foundational work on LLM privacy risks.

Academicresearch

Universal and Transferable Adversarial Attacks on Aligned Language Models (Zou et al. 2023)

The GCG attack paper. Showed that adversarial suffixes can bypass safety alignment in LLMs, transferring across models.

Academicresearch

Not What You've Signed Up For: Compromising RAG with Indirect Prompt Injection (Greshake et al. 2023)

Demonstrated indirect prompt injection attacks through RAG documents, emails, and web content. Essential reading for RAG security.

Anthropicresearch

Many-Shot Jailbreaking (Anthropic 2024)

Demonstrated that long-context LLMs can be jailbroken by providing many examples of the desired behavior. Scales with context window size.

Microsoftresearch

Crescendo: Multi-Turn LLM Jailbreak Attack (Microsoft 2024)

Showed that gradually escalating benign conversations can bypass safety filters over multiple turns. Defeats per-message safety checks.

Anthropicguide

Anthropic Research Index

Collection of Anthropic's published research on AI safety, alignment, interpretability, and security.

Black Hat / DEF CONguide

Black Hat / DEF CON Archives

Conference presentations covering novel attack techniques and defensive research. Essential for cutting-edge offensive/defensive questions. AI Village talks particularly relevant for Pillars B and C.

NISTframework

NIST AI 600-1 — AI RMF Generative AI Profile

Companion to AI RMF 1.0 specifically for generative AI. Maps 12 GenAI risks to RMF actions. Covers CBRN, CSAM, confabulation, data privacy, environmental, human-AI interaction, information integrity, IP, obscenity, toxicity, value chain.

OWASPtool

OWASP — "Top 10 for LLM Applications: Agentic Applications" (2025 supplement)

Extension of the LLM Top 10 specifically for agentic patterns. Covers excessive agency, insecure plugin/tool design, and multi-agent trust boundaries.

UIUCresearch

Fang et al. — "LLM Agents Can Autonomously Hack Websites" (2024)

Demonstrated GPT-4 exploiting real-world web vulnerabilities autonomously. 73% success rate on day-one CVEs. Key reference for questions about AI-augmented offensive capabilities and the asymmetry debate.

Roles where this matters

Career paths where this domain shows up as core or recommended.

🤖AI Security EngineerCore

Secure AI/ML systems from adversarial attacks, data poisoning, and model compromise. The fastest-growing specialization in cybersecurity.

AI Governance / AI Risk SpecialistRecommended

The policy/controls counterpart to the AI Security Engineer — owns risk frameworks, regulatory mapping (EU AI Act, NIST AI RMF), model documentation, and AI incident response policy.

🖥ML Platform Security EngineerRecommended

Secures the platform that trains, stores, and serves ML models — multi-tenant GPU isolation, pipeline integrity, feature-store hygiene, secrets management in ML workflows.

📦Product Security EngineerCore

Embedded in a product team — owns threat modelling, secure design, libraries, dependency risk, and increasingly the AI-specific hardening of LLM features the product ships.

Certifications that signal this domain

Credentials whose blueprint meaningfully covers this domain. Core means centrally covered; also touched means present in the blueprint but not the primary focus.

Core coverage

COASPProfessional·EC-CouncilOfficial page →

Certified Offensive AI Security Professional

EC-Council certification for offensive AI security. Focus on Prompt Injection, Model Extraction, Training Data Poisoning, Agent Hijacking, LLM Jailbreaking. Aligned with OWASP LLM Top 10, NIST AI RMF, ISO 42001. Brand new since February 2026.

GASAEProfessional·GIACOfficial page →

GIAC AI Security Automation Engineer

GIAC certification for AI Security Automation. Focus on agentic workflows, automated adversary emulation, AI-enabled response playbooks. Launched April 2026 — brand new.

OSAIProfessional·OffSecOfficial page →

OffSec AI Security Practitioner

Offensive AI security — adversarial ML, LLM attacks, agent abuse.

SecAI+Professional·CompTIAOfficial page →

CompTIA Security AI+

SecAI+ is CompTIA's answer to the need for certified professionals who combine classic cybersecurity skills with AI-specific security knowledge – officially launched in February 2026. As an 'Expansion Cert,' it is explicitly designed as a complement to existing credentials such as Security+, CySA+, or PenTest+ and targets practitioners who must secure AI systems and defend against AI-enabled attacks. Its strength lies in the practice-oriented domain structure (40% Securing AI Systems) and strong regulatory alignment story around EU AI Act and US Executive Order on AI. Weakness: The certification is only a few weeks old; job postings rarely demand it explicitly, and the market for learning materials is still thin. No hands-on labs in the exam – adversarial ML topics are tested conceptually, not practically.

Browse all certifications → — pick a cert on the interactive map to highlight every domain it covers.

More in Cybersecurity of AI Systems

See how your LLM-Specific Attacks skills stack up

306 questions available. Compete head-to-head or run a quick speed quiz to benchmark yourself.