Pillar C: Cybersecurity of AI SystemsC1

Adversarial Machine Learning

Evasion attacks, poisoning attacks, model extraction, membership inference, model inversion, gradient-based attacks.

Part of Pillar C: Cybersecurity of AI Systems · Cybersecurity of AI Systems groups the disciplines that share methods, tools, and threat models with Adversarial Machine Learning.

What is Adversarial Machine Learning?

Adversarial machine learning studies how attackers can manipulate, deceive, and exploit machine learning models through carefully crafted inputs and data manipulation. Unlike traditional software vulnerabilities with clear bug-fix remediation, adversarial ML attacks exploit fundamental properties of how models learn and generalize, making them exceptionally difficult to defend against.

The four primary attack categories define the threat landscape. Evasion attacks craft inputs at inference time that cause misclassification — adversarial examples that look normal to humans but fool classifiers, such as a stop sign with subtle perturbations that an autonomous vehicle reads as a speed limit sign. Poisoning attacks corrupt training data to introduce backdoors or degrade model performance. Model extraction attacks use query access to steal a proprietary model's functionality by training a surrogate. Membership inference attacks determine whether specific data points were in the training set, creating serious privacy risks.

Defense research has produced techniques like adversarial training, certified robustness, input preprocessing, and differential privacy, but no silver bullet exists. The field is in a continuous arms race, and the gap between attack sophistication and defensive maturity is widening as models become more complex and deployment becomes more widespread.

Why it matters

As ML models make high-stakes decisions in healthcare, finance, autonomous systems, and security, adversarial vulnerabilities become safety-critical. Understanding these attacks is essential for anyone deploying or securing AI systems.

Adversarial ML is the theoretical foundation for AI security. Every other Pillar C domain — from LLM security to deepfake detection — builds on the attack primitives and defensive concepts established here.

Key topics

Evasion attacks and adversarial examples (FGSM, PGD, C&W)
Data poisoning and backdoor attacks
Model extraction and model stealing
Membership inference and data privacy attacks
Adversarial training as a defense
Certified robustness and provable defenses
Transferability of adversarial examples
Physical-world adversarial attacks
Gradient masking and obfuscated gradients
Robustness evaluation and benchmarking

Standards and frameworks

Curated resources

Authoritative sources we ground Adversarial Machine Learning questions in — frameworks, research, guides, and tools.

MITREframework

MITRE ATLAS

Adversarial Threat Landscape for AI Systems. ATT&CK-style knowledge base of adversarial ML techniques, tactics, and real-world case studies.

NISTframework

NIST AI 100-2e2023 — Adversarial Machine Learning

Comprehensive taxonomy of adversarial ML attacks and mitigations. Covers evasion, poisoning, extraction, and inference attacks with standardized terminology.

OWASPframework

OWASP Machine Learning Security Top 10

Top 10 security risks specific to machine learning systems, including supply chain attacks, data poisoning, and model theft.

Academicresearch

Explaining and Harnessing Adversarial Examples (Goodfellow et al. 2014)

The seminal paper introducing FGSM (Fast Gradient Sign Method). Established that adversarial examples are a fundamental property of neural networks, not a bug.

Academicresearch

Towards Evaluating the Robustness of Neural Networks (Carlini & Wagner 2017)

Introduced the C&W attack, demonstrating that defensive distillation and other defenses could be reliably bypassed. Changed how robustness is evaluated.

Academicresearch

Towards Deep Learning Models Resistant to Adversarial Attacks (Madry et al. 2018)

Introduced PGD-based adversarial training, currently the most reliable defense against adversarial examples. Established the robustness-accuracy tradeoff.

Academicresearch

Practical Black-Box Attacks Against Machine Learning (Papernot et al. 2017)

Demonstrated that adversarial examples transfer between models, enabling black-box attacks via surrogate models. Key work on transferability.

Black Hat / DEF CONguide

Black Hat / DEF CON Archives

Conference presentations covering novel attack techniques and defensive research. Essential for cutting-edge offensive/defensive questions. AI Village talks particularly relevant for Pillars B and C.

University of Cagliariresearch

Biggio & Roli — "Wild Patterns: Ten Years After the Rise of Adversarial ML" (Pattern Recognition, 2018)

Historical survey tracing adversarial ML from 2004 spam filters through deep learning. Essential for questions on the evolution and taxonomy of adversarial attacks (evasion, poisoning, model extraction).

Unknownresearch

Gu et al. — "BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain" (2019)

Seminal backdoor attack paper. Demonstrated trojaned models in transfer learning scenarios. Foundational for AI supply chain security questions.

IBM / Trusted AItool

Adversarial Robustness Toolbox (ART)

Comprehensive library for adversarial ML. Supports attacks, defenses, and robustness evaluation across multiple ML frameworks.

Microsofttool

Counterfit

Microsoft's tool for assessing the security of ML models. Supports evasion, extraction, and inversion attacks.

Certifications that signal this domain

Credentials whose blueprint meaningfully covers this domain. Core means centrally covered; also touched means present in the blueprint but not the primary focus.

Core coverage

COASPProfessional·EC-CouncilOfficial page →

Certified Offensive AI Security Professional

EC-Council certification for offensive AI security. Focus on Prompt Injection, Model Extraction, Training Data Poisoning, Agent Hijacking, LLM Jailbreaking. Aligned with OWASP LLM Top 10, NIST AI RMF, ISO 42001. Brand new since February 2026.

GASAEProfessional·GIACOfficial page →

GIAC AI Security Automation Engineer

GIAC certification for AI Security Automation. Focus on agentic workflows, automated adversary emulation, AI-enabled response playbooks. Launched April 2026 — brand new.

GMLEProfessional·GIACOfficial page →

GIAC Machine Learning Engineer

GIAC Machine Learning Engineer

OSAIProfessional·OffSecOfficial page →

OffSec AI Security Practitioner

Offensive AI security — adversarial ML, LLM attacks, agent abuse.

SecAI+Professional·CompTIAOfficial page →

CompTIA Security AI+

SecAI+ is CompTIA's answer to the need for certified professionals who combine classic cybersecurity skills with AI-specific security knowledge – officially launched in February 2026. As an 'Expansion Cert,' it is explicitly designed as a complement to existing credentials such as Security+, CySA+, or PenTest+ and targets practitioners who must secure AI systems and defend against AI-enabled attacks. Its strength lies in the practice-oriented domain structure (40% Securing AI Systems) and strong regulatory alignment story around EU AI Act and US Executive Order on AI. Weakness: The certification is only a few weeks old; job postings rarely demand it explicitly, and the market for learning materials is still thin. No hands-on labs in the exam – adversarial ML topics are tested conceptually, not practically.

Browse all certifications → — pick a cert on the interactive map to highlight every domain it covers.

Education and certifications

More in Cybersecurity of AI Systems

See how your Adversarial Machine Learning skills stack up

300 questions available. Compete head-to-head or run a quick speed quiz to benchmark yourself.