AI Red Teaming
AI system threat modeling, red teaming methodology for LLMs (OWASP Top 10 for LLMs), automated red teaming tools, evaluation frameworks.
What is AI Red Teaming?
AI red teaming applies offensive security methodology to artificial intelligence systems, systematically probing models, agents, and AI-powered applications for vulnerabilities that traditional security testing cannot find. Unlike conventional penetration testing, AI red teaming requires understanding both the AI attack surface (prompt injection, jailbreaking, data extraction) and the unique failure modes of AI systems (hallucination, unsafe content generation, alignment failures).
The OWASP Top 10 for LLM Applications provides the foundational risk taxonomy, covering vulnerabilities from prompt injection (LLM01) through excessive agency (LLM08) to model theft (LLM10). AI red teamers combine these known vulnerability classes with creative adversarial thinking to find novel attack chains — such as using prompt injection to manipulate an agent's tool calls, or chaining hallucination with output handling vulnerabilities to achieve code execution.
Automated red teaming is rapidly evolving. Tools like Microsoft's PyRIT, NVIDIA's Garak, and custom harnesses enable systematic testing at scale — generating adversarial prompts, testing for content policy violations, evaluating guardrail bypass techniques, and fuzzing agent tool integrations. The field draws from both traditional application security testing and AI safety evaluation, creating a new discipline that requires expertise in both.
Why it matters
AI systems cannot be secured without being adversarially tested. AI red teaming is how organizations discover the gap between their intended AI behavior and what attackers can actually make these systems do.
AI red teaming is the offensive counterpart to AI safety and governance. It provides empirical evidence of AI risk by demonstrating real attacks, grounding abstract risk frameworks in concrete, exploitable vulnerabilities.
Detect, Test & Respond
Watch, hunt, attack ethically, analyse, and respond — classical and AI.
Other domains in this layer
Key topics
People shaping this field
Researchers and practitioners worth following in this space.
AI security researcher, creator of the AI Attack Surface Map
Security researcher, expert on indirect prompt injection
Researcher on prompt injection taxonomy and Learn Prompting
Curated resources
Authoritative sources we ground AI Red Teaming questions in — frameworks, research, guides, and tools.
OWASP Top 10 for LLM Applications 2025
The definitive security risk list for LLM-powered applications. Covers prompt injection, insecure output handling, training data poisoning, and more.
MITRE ATLAS
Adversarial Threat Landscape for AI Systems. ATT&CK-style knowledge base of adversarial ML techniques, tactics, and real-world case studies.
NIST AI 100-2e2023 — Adversarial Machine Learning
Comprehensive taxonomy of adversarial ML attacks and mitigations. Covers evasion, poisoning, extraction, and inference attacks with standardized terminology.
Microsoft AI Red Team
Comprehensive guide to AI red teaming from Microsoft's dedicated AI security team. Covers methodology, tools, and findings.
Black Hat / DEF CON Archives
Conference presentations covering novel attack techniques and defensive research. Essential for cutting-edge offensive/defensive questions. AI Village talks particularly relevant for Pillars B and C.
NIST AI 600-1 — AI RMF Generative AI Profile
Companion to AI RMF 1.0 specifically for generative AI. Maps 12 GenAI risks to RMF actions. Covers CBRN, CSAM, confabulation, data privacy, environmental, human-AI interaction, information integrity, IP, obscenity, toxicity, value chain.
DEF CON 31 — AI Village Red Team Challenge
Largest public AI red teaming event. 2,200+ participants testing multiple foundation models. Established community norms for responsible AI red teaming. Good for questions on practical red team methodology.
Google DeepMind — "Evaluating Frontier Models for Dangerous Capabilities" (2023)
Framework for evaluating dangerous capabilities: persuasion, deception, cyber operations, self-replication. Defines evaluation methodology for frontier model safety. Questions on what to test and how to interpret results.
Harang & Ruef — "Securing LLMs Against Jailbreaking for Cybersecurity" (2024)
Analysis of how LLMs can be used for offensive security tasks and the implications for defensive guardrails. Covers the dual-use nature of security LLMs.
Bishop Fox — AI Red Teaming Research
Research on using AI for penetration testing automation: reconnaissance, vulnerability discovery, exploit generation. Practitioner perspective on what's practical vs. theoretical.
Anthropic — "Responsible Scaling Policy" and capability evaluations
Evaluates model capabilities for autonomous cyber operations at each AI Safety Level (ASL). Defines thresholds where AI capability in offensive security requires additional safeguards. Key reference for responsible AI in offensive security.
Microsoft — "Lessons Learned from Red-Teaming 100 Generative AI Products"
Practical lessons from large-scale LLM red teaming across real products. Covers failure modes, testing methodologies, and organizational patterns. Rare insight into enterprise-scale AI security.
Roles where this matters
Career paths where this domain shows up as core or recommended.
Ethically hack systems to find vulnerabilities before attackers do. Offensive security requires deep technical knowledge.
Secure AI/ML systems from adversarial attacks, data poisoning, and model compromise. The fastest-growing specialization in cybersecurity.
Secures the platform that trains, stores, and serves ML models — multi-tenant GPU isolation, pipeline integrity, feature-store hygiene, secrets management in ML workflows.
Embedded in a product team — owns threat modelling, secure design, libraries, dependency risk, and increasingly the AI-specific hardening of LLM features the product ships.
Certifications that signal this domain
Credentials whose blueprint meaningfully covers this domain. Core means centrally covered; also touched means present in the blueprint but not the primary focus.
Core coverage
OffSec AI Security Practitioner
Offensive AI security — adversarial ML, LLM attacks, agent abuse.
Browse all certifications → — pick a cert on the interactive map to highlight every domain it covers.
More in Cybersecurity of AI Systems
See how your AI Red Teaming skills stack up
301 questions available. Compete head-to-head or run a quick speed quiz to benchmark yourself.