Pillar C: Cybersecurity of AI SystemsC6

AI Infrastructure Security

GPU cluster security, ML pipeline security, model serving endpoints, secrets management in ML.

Part of Pillar C: Cybersecurity of AI Systems · Cybersecurity of AI Systems groups the disciplines that share methods, tools, and threat models with AI Infrastructure Security.

What is AI Infrastructure Security?

AI infrastructure security addresses the unique challenges of securing the compute, storage, networking, and orchestration systems that power machine learning workloads. Unlike traditional IT infrastructure, AI systems require specialized hardware (GPU clusters, TPUs), massive data pipelines, experiment tracking platforms, model registries, and serving infrastructure — each introducing attack surface that conventional security tools were not designed to protect.

GPU clusters represent high-value targets for attackers. A single NVIDIA H100 GPU costs tens of thousands of dollars, and organizations often run clusters worth millions. Cryptojacking, unauthorized training runs, and GPU memory side-channel attacks are real threats. ML pipeline security is equally critical — tools like Kubeflow, MLflow, Airflow, and custom training pipelines handle sensitive data and model artifacts, often with insufficient authentication, authorization, and audit logging.

Model serving infrastructure exposes trained models as API endpoints, creating attack surface for model extraction, denial of service, and adversarial input attacks. Secrets management is particularly challenging in ML environments where API keys, cloud credentials, and data access tokens are frequently embedded in notebooks, configuration files, and container images. Securing AI infrastructure requires adapting DevSecOps practices to MLOps while addressing the unique requirements of GPU workloads, large-scale data movement, and model lifecycle management.

Why it matters

AI models are only as secure as the infrastructure they run on. Compromised training pipelines, exposed model endpoints, and misconfigured GPU clusters can undermine every other AI security control.

AI infrastructure security is the operational foundation beneath all other AI security domains. It ensures that the compute, data, and model artifacts are protected throughout the ML lifecycle — from experimentation to production serving.

Layer 3

Build, Connect & Operate

Build and run the systems — apps, cloud, data, networks, OT, AI infra, supply chain, quantum engineering.

Other domains in this layer

A2Network Security A4Application Security A5Cloud Security A12Data Security, Privacy & Protection A13Supply Chain Security A14OT/ICS Security A16Mobile & IoT Security A17Cyber-Electronic Warfare C3AI Supply Chain Security C4AI Data Security D5Quantum Networking & Communication D6Quantum Security Engineering A25Security Architecture & Engineering A26Security Tool & Vendor Landscape

See how this layer connects to the rest of the domain map →

Standards and frameworks

NIST AI 100-1 (AI Risk Management Framework)NIST

CIS Kubernetes BenchmarkCIS

MITRE ATLAS — ML Infrastructure AttacksMITRE

Curated resources

Authoritative sources we ground AI Infrastructure Security questions in — frameworks, research, guides, and tools.

NISTframework

Certifications that signal this domain

Credentials whose blueprint meaningfully covers this domain. Core means centrally covered; also touched means present in the blueprint but not the primary focus.