Pillar C: Cybersecurity of AI SystemsC4

AI Data Security

Training data poisoning, PII leakage from models, differential privacy, federated learning security.

Part of Pillar C: Cybersecurity of AI Systems · Cybersecurity of AI Systems groups the disciplines that share methods, tools, and threat models with AI Data Security.

What is AI Data Security?

AI data security focuses on protecting the data that powers machine learning systems across its entire lifecycle — from collection and curation through training, fine-tuning, and inference. Training data is the DNA of every AI model, and its compromise can have cascading effects that are nearly impossible to detect or remediate after the model is deployed.

Training data poisoning is a primary concern. Attackers who can influence even a small fraction of training data can implant backdoors that activate only on specific trigger inputs, degrade model performance on targeted classes, or embed biases that serve adversarial objectives. The scale of modern training datasets (billions of tokens for LLMs, millions of images for vision models) makes comprehensive data validation extremely challenging.

PII leakage from trained models is another critical risk. LLMs have been shown to memorize and regurgitate verbatim training data, including personally identifiable information, API keys, and proprietary content. Differential privacy provides mathematical guarantees that individual training examples cannot be extracted, but it comes at a cost to model utility. Data security for AI requires a combination of data governance, privacy-preserving techniques, access controls on training infrastructure, and continuous monitoring for data exfiltration through model outputs.

Why it matters

The integrity and privacy of training data directly determines the trustworthiness of every model built from it. A single data compromise can create vulnerabilities that persist across every downstream application.

AI data security bridges traditional data governance and privacy with the unique requirements of ML systems, where data doesn't just need to be protected at rest and in transit — it needs to be validated, curated, and audited as a security-critical input to model behavior.

Layer 3

Build, Connect & Operate

Build and run the systems — apps, cloud, data, networks, OT, AI infra, supply chain, quantum engineering.

Other domains in this layer

A2Network Security A4Application Security A5Cloud Security A12Data Security, Privacy & Protection A13Supply Chain Security A14OT/ICS Security A16Mobile & IoT Security A17Cyber-Electronic Warfare C3AI Supply Chain Security C6AI Infrastructure Security D5Quantum Networking & Communication D6Quantum Security Engineering A25Security Architecture & Engineering

See how this layer connects to the rest of the domain map →

Standards and frameworks

NIST Privacy Framework 1.0NIST

ISO/IEC 27701 (Privacy Information Management)ISO

GDPR Articles 22, 35 (Automated Decision-Making, DPIA)European Union

Roles where this matters

Career paths where this domain shows up as core or recommended.

🤖AI Security EngineerRecommended

Secure AI/ML systems from adversarial attacks, data poisoning, and model compromise. The fastest-growing specialization in cybersecurity.

🔒Privacy Engineer / DPOCore

Build privacy into systems by design. Navigate GDPR, CCPA, and emerging AI privacy regulations.

⚖AI Governance / AI Risk SpecialistCore

The policy/controls counterpart to the AI Security Engineer — owns risk frameworks, regulatory mapping (EU AI Act, NIST AI RMF), model documentation, and AI incident response policy.

🖥ML Platform Security EngineerCore

Secures the platform that trains, stores, and serves ML models — multi-tenant GPU isolation, pipeline integrity, feature-store hygiene, secrets management in ML workflows.

📦Product Security EngineerRecommended

Embedded in a product team — owns threat modelling, secure design, libraries, dependency risk, and increasingly the AI-specific hardening of LLM features the product ships.

Certifications that signal this domain

Credentials whose blueprint meaningfully covers this domain. Core means centrally covered; also touched means present in the blueprint but not the primary focus.

Core coverage

CRAIProfessional·ISACAOfficial page →

ISACA Certified in Risk of Artificial Intelligence (emerging)

AI risk management and governance — emerging blueprint, expect revisions.

Also touched

AIGPProfessional·IAPPOfficial page →

Artificial Intelligence Governance Professional

AI risk, governance, and regulatory literacy (EU AI Act, NIST AI RMF).

CIPMProfessional·IAPPOfficial page →

Certified Information Privacy Manager

Running a privacy program end-to-end.

CIPTProfessional·IAPPOfficial page →

Certified Information Privacy Technologist

Privacy engineering, privacy-by-design in products and platforms.

Browse all certifications → — pick a cert on the interactive map to highlight every domain it covers.

People shaping this field

Researchers and practitioners worth following in this space.

Cynthia Dwork

Pioneer of differential privacy

Vitaly Shmatikov

Cornell professor, research on ML privacy and data poisoning

Florian Tramer

ETH Zurich professor, model extraction and training data privacy

Curated resources

Authoritative sources we ground AI Data Security questions in — frameworks, research, guides, and tools.

NISTframework

See how your AI Data Security skills stack up

302 questions available. Compete head-to-head or run a quick speed quiz to benchmark yourself.

Start a Quiz Sign Up Free

AI Data Security

What is AI Data Security?

Why it matters

Build, Connect & Operate

Standards and frameworks

Roles where this matters

Certifications that signal this domain

Core coverage

Also touched

People shaping this field

Curated resources

NIST Privacy Framework

Extracting Training Data from Large Language Models (Carlini et al. 2021)

Membership Inference Attacks Against Machine Learning Models (Shokri et al. 2017)

Deep Learning with Differential Privacy (Abadi et al. 2016)

Machine Unlearning (Bourtoule et al. 2021)

Carlini et al. — "Extracting Training Data from Diffusion Models" (USENIX Security 2023)

Nasr et al. — "Scalable Extraction of Training Data from (Production) Language Models" (2023)

Dwork & Roth — "The Algorithmic Foundations of Differential Privacy" (2014)

Google — "Differential Privacy in Practice" (DP libraries)

More in Cybersecurity of AI Systems

Adversarial Machine Learning

LLM-Specific Attacks

AI Supply Chain Security

AI Red Teaming

AI Infrastructure Security

AI Governance & Risk

See how your AI Data Security skills stack up