Pillar C: Cybersecurity of AI SystemsC3

AI Supply Chain Security

Model provenance, dataset poisoning, Hugging Face risks, ML library vulnerabilities, trojanized models.

Part of Pillar C: Cybersecurity of AI Systems · Cybersecurity of AI Systems groups the disciplines that share methods, tools, and threat models with AI Supply Chain Security.

What is AI Supply Chain Security?

AI supply chain security addresses the risks introduced when organizations consume pre-trained models, datasets, and ML libraries from external sources. Just as traditional software supply chains became a prime attack vector (SolarWinds, Log4Shell), the AI supply chain presents analogous — and in some ways more dangerous — risks because models are opaque binaries that can embed backdoors invisible to code review.

Model provenance is a critical challenge. When a team downloads a model from Hugging Face, they inherit every risk from that model's training process — poisoned training data, embedded backdoors, malicious serialization payloads (pickle deserialization attacks are rampant), and undisclosed biases. The Hugging Face ecosystem alone hosts over a million models, many with minimal vetting. Researchers have demonstrated that malicious models can execute arbitrary code upon loading through Python's pickle format.

Dataset poisoning in the supply chain is equally concerning. Popular datasets like LAION, Common Crawl, and curated benchmark sets can be manipulated at scale. Attackers can contribute poisoned examples to public datasets, compromise data pipelines, or create convincing fake datasets that introduce subtle backdoors. Defending the AI supply chain requires model signing, provenance tracking, dependency scanning, and runtime integrity verification.

Why it matters

Organizations building on open-source models and public datasets inherit every upstream risk. Without supply chain security controls, a poisoned model from Hugging Face or a backdoored dataset can compromise an entire AI deployment.

AI supply chain security connects traditional software supply chain disciplines (SBOMs, dependency management, code signing) to the unique challenges of ML artifacts — models, datasets, and training pipelines — where traditional scanning tools are blind.

Layer 3