Where every claim in SecProve comes from.

NIST AI 100-2e2023 — Adversarial Machine LearningNIST

192

cites

FRAMEWORK

added 1mo ago

Comprehensive taxonomy of adversarial ML attacks and mitigations. Covers evasion, poisoning, extraction, and inference attacks with standardized terminology.

●FrameworkIntermediateC1 · Adversarial Machine Learning C5 · AI Red TeamingNEW · 1mo ago

cites

FRAMEWORK

added 1mo ago

MITRE ATLASMITRE

Adversarial Threat Landscape for AI Systems. ATT&CK-style knowledge base of adversarial ML techniques, tactics, and real-world case studies.

●FrameworkC1 · Adversarial Machine Learning C2 · LLM-Specific Attacks C5 · AI Red Teaming★ STARTERNEW · 1mo ago

cites

FRAMEWORK

added 1mo ago

Microsoft AI Red TeamMicrosoft

Comprehensive guide to AI red teaming from Microsoft's dedicated AI security team. Covers methodology, tools, and findings.

●GuideIntermediateC5 · AI Red TeamingNEW · 1mo ago

NIST AI Risk Management Framework (AI 100-1)NIST

cites

GUIDE

added 1mo ago

The authoritative framework for managing AI risks. Defines four core functions: Govern, Map, Measure, Manage. Essential reading for anyone building or deploying AI systems.

●FrameworkC7 · AI Governance & Risk★ STARTERNEW · 1mo ago

cites

FRAMEWORK

added 1mo ago

NIST Cybersecurity Framework 2.0NIST

Updated cybersecurity framework with six core functions: Govern, Identify, Protect, Detect, Respond, Recover.

●FrameworkFoundationalC7 · AI Governance & Risk★ STARTERNEW · 1mo ago

cites

FRAMEWORK

added 1mo ago

Deep Learning with Differential Privacy (Abadi et al. 2016)Academic

Introduced DP-SGD for training neural networks with formal differential privacy guarantees. Foundation for private ML.

●ResearchAdvancedC4 · AI Data SecurityNEW · 1mo ago

Membership Inference Attacks Against Machine Learning Models (Shokri et al. 2017)Academic

cites

RESEARCH

added 1mo ago

First practical membership inference attack against ML models. Showed that ML APIs leak information about their training data.

●ResearchAdvancedC4 · AI Data SecurityNEW · 1mo ago

Towards Deep Learning Models Resistant to Adversarial Attacks (Madry et al. 2018)Academic

cites

RESEARCH

added 1mo ago

Introduced PGD-based adversarial training, currently the most reliable defense against adversarial examples. Established the robustness-accuracy tradeoff.

●ResearchAdvancedC1 · Adversarial Machine LearningNEW · 1mo ago

ISO/IEC 42001 — AI Management SystemISO

cites

RESEARCH

added 1mo ago

International standard for establishing and maintaining an AI management system. Includes 39 controls across 10 areas.

●FrameworkAdvancedC7 · AI Governance & RiskNEW · 1mo ago

Gu et al. — "BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain" (2019)Unknown

cites

FRAMEWORK

added 1mo ago

Seminal backdoor attack paper. Demonstrated trojaned models in transfer learning scenarios. Foundational for AI supply chain security questions.

●ResearchIntermediateC3 · AI Supply Chain Security C1 · Adversarial Machine LearningNEW · 22d ago

Practical Black-Box Attacks Against Machine Learning (Papernot et al. 2017)Academic

cites

RESEARCH

added 22d ago

Demonstrated that adversarial examples transfer between models, enabling black-box attacks via surrogate models. Key work on transferability.

●ResearchAdvancedC1 · Adversarial Machine LearningNEW · 1mo ago

Towards Evaluating the Robustness of Neural Networks (Carlini & Wagner 2017)Academic

cites

RESEARCH

added 1mo ago

Introduced the C&W attack, demonstrating that defensive distillation and other defenses could be reliably bypassed. Changed how robustness is evaluated.

●ResearchAdvancedC1 · Adversarial Machine LearningNEW · 1mo ago

cites

RESEARCH

added 1mo ago

Anthropic Research IndexAnthropic

Collection of Anthropic's published research on AI safety, alignment, interpretability, and security.

●GuideIntermediateC8 · AI Safety & Alignment C2 · LLM-Specific AttacksNEW · 1mo ago

cites

GUIDE

added 1mo ago

EU AI ActEuropean Union

The European Union's comprehensive AI regulation. Classifies AI systems by risk level and sets requirements for high-risk systems.

●FrameworkIntermediateC7 · AI Governance & RiskNEW · 1mo ago

cites

FRAMEWORK

added 1mo ago

Microsoft PyRITMicrosoft

Python Risk Identification Toolkit for generative AI. Automated red teaming framework for testing LLM applications.

●ToolIntermediateC5 · AI Red Teaming C2 · LLM-Specific AttacksNEW · 1mo ago

cites

TOOL

added 1mo ago

NIST Privacy FrameworkNIST

Voluntary framework for improving privacy through enterprise risk management. Complements the Cybersecurity Framework.

●FrameworkIntermediateC4 · AI Data SecurityNEW · 1mo ago

cites

FRAMEWORK

added 1mo ago

Explaining and Harnessing Adversarial Examples (Goodfellow et al. 2014)Academic

The seminal paper introducing FGSM (Fast Gradient Sign Method). Established that adversarial examples are a fundamental property of neural networks, not a bug.

●ResearchAdvancedC1 · Adversarial Machine LearningNEW · 1mo ago

Extracting Training Data from Large Language Models (Carlini et al. 2021)Academic

cites

RESEARCH

added 1mo ago

Demonstrated that LLMs memorize and can be prompted to regurgitate training data verbatim, including PII. Foundational work on LLM privacy risks.

●ResearchAdvancedC2 · LLM-Specific Attacks C4 · AI Data SecurityNEW · 1mo ago

cites

RESEARCH

added 1mo ago

C2PA SpecificationC2PA

Coalition for Content Provenance and Authenticity. Technical standard for digital content provenance and integrity.

●FrameworkAdvancedC9 · Deepfakes & Synthetic MediaNEW · 1mo ago

cites

FRAMEWORK

added 1mo ago

SafeTensors DocumentationHugging Face

Hugging Face's safe serialization format for ML models. Prevents arbitrary code execution from pickle-based attacks.

●GuideFoundationalC3 · AI Supply Chain SecurityNEW · 1mo ago

cites

GUIDE

added 1mo ago

Crescendo: Multi-Turn LLM Jailbreak Attack (Microsoft 2024)Microsoft

Showed that gradually escalating benign conversations can bypass safety filters over multiple turns. Defeats per-message safety checks.

●ResearchAdvancedC2 · LLM-Specific AttacksNEW · 1mo ago

Not What You've Signed Up For: Compromising RAG with Indirect Prompt Injection (Greshake et al. 2023)Academic

cites

RESEARCH

added 1mo ago

Demonstrated indirect prompt injection attacks through RAG documents, emails, and web content. Essential reading for RAG security.

●ResearchIntermediateC2 · LLM-Specific AttacksNEW · 1mo ago

Universal and Transferable Adversarial Attacks on Aligned Language Models (Zou et al. 2023)Academic

cites

RESEARCH

added 1mo ago

The GCG attack paper. Showed that adversarial suffixes can bypass safety alignment in LLMs, transferring across models.

●ResearchAdvancedC2 · LLM-Specific AttacksNEW · 1mo ago

cites

RESEARCH

added 1mo ago

CISA Deepfake Detection ResourcesCISA

CISA guidance on understanding, detecting, and defending against deepfake threats in organizational contexts.

●GuideFoundationalC9 · Deepfakes & Synthetic Media C10 · AI-Enabled DisinformationNEW · 1mo ago

cites

GUIDE

added 1mo ago

Amodei et al. — "Concrete Problems in AI Safety" (2016)Google Brain / OpenAI

Five practical safety problems: avoiding side effects, reward hacking, scalable oversight, safe exploration, distributional shift. Still the canonical taxonomy for AI safety research questions.

●ResearchIntermediateC8 · AI Safety & AlignmentNEW · 22d ago

cites

RESEARCH

added 22d ago

Hugging Face — Model Security and SafetyHugging Face

The largest model hub. Security features: malware scanning, pickle scanning, safetensors format. Questions on model provenance, serialization risks (pickle exploits), and model marketplace trust.

●ToolIntermediateC3 · AI Supply Chain SecurityNEW · 22d ago

cites

TOOL

added 22d ago

LangChain Security Best PracticesLangChain

Security documentation for LangChain agent framework — sandboxing, tool permissions, prompt injection defenses, and deployment hardening.

●GuideIntermediateC11 · Agentic AI SecurityNEW · 1mo ago

NIST SP 800-190 — Container Security GuideNIST

cites

GUIDE

added 1mo ago

Application container security guide covering image, registry, orchestrator, container, and host OS security.

●FrameworkIntermediateC6 · AI Infrastructure SecurityNEW · 1mo ago

cites

FRAMEWORK

added 1mo ago

Garak — LLM Vulnerability ScannerNVIDIA

NVIDIA's open-source LLM vulnerability scanner. Tests for prompt injection, jailbreaking, data leakage, and more.

●ToolIntermediateC5 · AI Red Teaming C2 · LLM-Specific AttacksNEW · 1mo ago

OpenAI — "Influence Operations Reports" (Threat Intelligence)OpenAI

cites

TOOL

added 1mo ago

Reports on state-affiliated actors using AI for influence operations. Documents actual observed misuse, not theoretical risks. Key for questions about real-world AI-enabled disinformation.

●ResearchIntermediateC10 · AI-Enabled DisinformationNEW · 22d ago

cites

RESEARCH

added 22d ago

RAND — "The Firehose of Falsehood" and Information Warfare researchRAND Corporation

Research on propaganda techniques, cognitive security, and information warfare. The "firehose of falsehood" model explains high-volume, multi-channel disinformation. Good for strategic questions.

●ResearchIntermediateC10 · AI-Enabled DisinformationNEW · 22d ago

cites

RESEARCH

added 22d ago

MLflow / Kubeflow / Ray Security DocumentationVarious (Databricks, Google, Anyscale)

Security docs for major ML platforms. Covers authentication, authorization, experiment tracking security, model registry access controls. Practical infrastructure security questions.

●ToolIntermediateC6 · AI Infrastructure SecurityNEW · 22d ago

cites

TOOL

added 22d ago

Machine Unlearning (Bourtoule et al. 2021)Academic

Introduced SISA training for efficient machine unlearning — enabling models to "forget" specific training data without full retraining.

●ResearchAdvancedC4 · AI Data SecurityNEW · 1mo ago

RESEARCH

added 1mo ago

RobustBench — Adversarial Robustness BenchmarkAcademic

Standardized benchmark for evaluating adversarial robustness of ML models. Leaderboard of most robust models.

●ToolAdvancedC1 · Adversarial Machine LearningNEW · 1mo ago

TOOL

added 1mo ago

TruthfulQA BenchmarkAcademic

Benchmark measuring whether language models generate truthful answers. Tests for common misconceptions and falsehoods.

●ToolAdvancedC8 · AI Safety & AlignmentNEW · 1mo ago

TOOL

added 1mo ago

Content Authenticity Initiative (CAI)Adobe-led

Industry coalition implementing C2PA. Open-source tools for content credentials. Practical implementation questions about provenance at scale.

●ToolIntermediateC9 · Deepfakes & Synthetic MediaNEW · 22d ago

TOOL

added 22d ago

DEF CON 31 — AI Village Red Team ChallengeAI Village / DEF CON

Largest public AI red teaming event. 2,200+ participants testing multiple foundation models. Established community norms for responsible AI red teaming. Good for questions on practical red team methodology.

●GuideIntermediateC5 · AI Red TeamingNEW · 22d ago

GUIDE

added 22d ago

Anthropic — "Challenges in Deploying Machine Learning Agents" researchAnthropic

Analysis of risks specific to AI agents: tool use, chain-of-thought exploitation, multi-step task failures, delegation risks. Key for understanding why agents create new attack surfaces beyond single-turn interactions.

●ResearchIntermediateC11 · Agentic AI SecurityNEW · 22d ago

RESEARCH

added 22d ago

Anthropic — "Red Teaming Language Models to Reduce Harms" (2022)Anthropic

Crowdsourced red teaming methodology with 38,961 attacks across multiple models. Taxonomy of harmful outputs and effectiveness of different red teaming strategies. Key reference for structured AI red teaming.

●ResearchIntermediateC5 · AI Red TeamingNEW · 22d ago

RESEARCH

added 22d ago

Anthropic Responsible Scaling PolicyAnthropic

Anthropic's framework for responsible AI development. Defines AI Safety Levels (ASL) and capability thresholds.

●GuideIntermediateC8 · AI Safety & AlignmentNEW · 1mo ago

GUIDE

added 1mo ago

Constitutional AI: Harmlessness from AI Feedback (Bai et al. 2022)Anthropic

Anthropic's approach to AI alignment using a set of principles (a "constitution") to train helpful and harmless AI. Foundation of modern RLHF alternatives.

●ResearchIntermediateC8 · AI Safety & AlignmentNEW · 1mo ago

RESEARCH

added 1mo ago

Many-Shot Jailbreaking (Anthropic 2024)Anthropic

Demonstrated that long-context LLMs can be jailbroken by providing many examples of the desired behavior. Scales with context window size.

●ResearchIntermediateC2 · LLM-Specific AttacksNEW · 1mo ago

RESEARCH

added 1mo ago

Model Context Protocol (MCP) SpecificationAnthropic

Anthropic's open protocol for connecting AI models to external tools and data sources. Critical reading for agentic AI security.

●FrameworkIntermediateC11 · Agentic AI SecurityNEW · 1mo ago

FRAMEWORK

added 1mo ago

C2PA (Coalition for Content Provenance and Authenticity)C2PA (Adobe, Microsoft, BBC, others)

Technical standard for content provenance. Cryptographic binding of creation metadata to content. The leading technical approach to synthetic media authentication. Questions on architecture, limitations, and adoption challenges.

●FrameworkIntermediateC9 · Deepfakes & Synthetic Media C10 · AI-Enabled DisinformationNEW · 22d ago

Center for AI Safety (CAIS) — "An Overview of Catastrophic AI Risks"CAIS

FRAMEWORK

added 22d ago

Comprehensive taxonomy of AI risks: weaponization, misinformation, power concentration, value lock-in, rogue AI. Good for strategic-level safety questions beyond technical alignment.

●ResearchIntermediateC8 · AI Safety & AlignmentNEW · 22d ago

RESEARCH

added 22d ago

Kubernetes Security Best PracticesCNCF

Official Kubernetes documentation on securing clusters, pods, and workloads. Essential for ML infrastructure security.

●GuideIntermediateC6 · AI Infrastructure SecurityNEW · 1mo ago

GUIDE

added 1mo ago

DISARM FrameworkDISARM Foundation

Framework for analyzing and countering disinformation. Provides a structured approach to information manipulation threats.

●FrameworkIntermediateC10 · AI-Enabled DisinformationNEW · 1mo ago

FRAMEWORK

added 1mo ago

EU AI Act — High-Risk System RequirementsEuropean Parliament

(See cross-cutting.md.) For C7 specifically: conformity assessments, technical documentation requirements, post-market monitoring, fundamental rights impact assessments. Detailed compliance questions.

●FrameworkIntermediateC7 · AI Governance & RiskNEW · 22d ago

Europol — "Facing Reality: Law Enforcement and the Challenge of Deepfakes" (2022)Europol

FRAMEWORK

added 22d ago

Law enforcement perspective on deepfake threats: evidence tampering, identity fraud, CEO fraud, CSAM. Policy and response frameworks.

●FrameworkIntermediateC9 · Deepfakes & Synthetic MediaNEW · 22d ago

FRAMEWORK

added 22d ago

Gartner — Top Strategic Technology TrendsGartner

Annual trends report. AI trust, risk, and security management (AI TRiSM) has been featured prominently. Good for strategic-level questions about where the industry is heading.

●ResearchIntermediateC11 · Agentic AI SecurityNEW · 22d ago

RESEARCH

added 22d ago

Gartner Hype Cycle for AI (2024)Gartner

Positions AI security technologies on the hype cycle. Useful for questions about technology maturity, adoption timelines, and distinguishing hype from operational readiness.

●ResearchIntermediateC7 · AI Governance & RiskNEW · 22d ago

RESEARCH

added 22d ago

Goldstein et al. — "Generative Language Models and Automated Influence Operations" (2023)Georgetown / OpenAI / Stanford

Analysis of how LLMs can amplify influence operations: cost reduction, scalability, personalization, multilingual content. Framework for assessing disinformation risk from generative AI.

●ResearchIntermediateC10 · AI-Enabled DisinformationNEW · 22d ago

Google — "Differential Privacy in Practice" (DP libraries)Google

RESEARCH

added 22d ago

Open-source DP libraries and practical guides. Bridges theory to implementation. Good for questions on real-world DP deployment challenges and privacy budget management.

●ResearchIntermediateC4 · AI Data SecurityNEW · 22d ago

RESEARCH

added 22d ago

Google Secure AI Framework (SAIF)Google

Google's conceptual framework for securing AI systems. Covers supply chain, data governance, and deployment security.

●FrameworkIntermediateC7 · AI Governance & Risk C8 · AI Safety & AlignmentNEW · 1mo ago

FRAMEWORK

added 1mo ago

DeepMind — "Scalable Agent Alignment via Reward Modeling" and safety researchGoogle DeepMind

Research on reward modeling, debate, recursive reward modeling, and interpretability. Provides an alternative perspective to Anthropic/OpenAI approaches.

●ResearchIntermediateC8 · AI Safety & AlignmentNEW · 22d ago

RESEARCH

added 22d ago

Google DeepMind — "Evaluating Frontier Models for Dangerous Capabilities" (2023)Google DeepMind

Framework for evaluating dangerous capabilities: persuasion, deception, cyber operations, self-replication. Defines evaluation methodology for frontier model safety. Questions on what to test and how to interpret results.

●ResearchIntermediateC5 · AI Red Teaming C8 · AI Safety & AlignmentNEW · 22d ago

RESEARCH

added 22d ago

Google SynthIDGoogle DeepMind

Google DeepMind's watermarking technology for AI-generated content. Embeds imperceptible watermarks in images, audio, and text.

●ToolFoundationalC9 · Deepfakes & Synthetic MediaNEW · 1mo ago

TOOL

added 1mo ago

Nasr et al. — "Scalable Extraction of Training Data from (Production) Language Models" (2023)Google DeepMind

Extracted training data from ChatGPT (production model) using a divergence attack. Showed alignment doesn't prevent memorization. Questions on the gap between safety fine-tuning and data protection.

●ResearchIntermediateC4 · AI Data SecurityNEW · 22d ago

RESEARCH

added 22d ago

Hugging Face Security DocumentationHugging Face

Security best practices for using Hugging Face Hub — model scanning, SafeTensors, access controls, and supply chain considerations.

●GuideFoundationalC3 · AI Supply Chain SecurityNEW · 1mo ago

GUIDE

added 1mo ago

Adversarial Robustness Toolbox (ART)IBM / Trusted AI

Comprehensive library for adversarial ML. Supports attacks, defenses, and robustness evaluation across multiple ML frameworks.

●ToolIntermediateC1 · Adversarial Machine Learning C5 · AI Red TeamingNEW · 1mo ago

JFrog — "Malicious Models on Hugging Face" researchJFrog

TOOL

added 1mo ago

Discovered 100+ malicious models on Hugging Face exploiting pickle deserialization for code execution. Real-world evidence of AI supply chain attacks. Good for scenario-based questions.

●GuideIntermediateC3 · AI Supply Chain SecurityNEW · 22d ago

GUIDE

added 22d ago

CounterfitMicrosoft

Microsoft's tool for assessing the security of ML models. Supports evasion, extraction, and inversion attacks.

●ToolIntermediateC1 · Adversarial Machine LearningNEW · 1mo ago

TOOL

added 1mo ago

Microsoft — "Lessons Learned from Red-Teaming 100 Generative AI Products"Microsoft

Practical lessons from large-scale LLM red teaming across real products. Covers failure modes, testing methodologies, and organizational patterns. Rare insight into enterprise-scale AI security.

●GuideIntermediateC2 · LLM-Specific Attacks C5 · AI Red TeamingNEW · 22d ago

GUIDE

added 22d ago

Dwork & Roth — "The Algorithmic Foundations of Differential Privacy" (2014)Microsoft Research / UPenn

The theoretical foundation for differential privacy. Essential for questions on privacy-preserving ML training (DP-SGD) and the epsilon-delta framework.

●ResearchIntermediateC4 · AI Data SecurityNEW · 22d ago

MIT Media Lab — "The Spread of True and False News Online" (Vosoughi et al., Science, 2018)MIT

RESEARCH

added 22d ago

Landmark study: false news spreads farther, faster, deeper than true news on social media. Not AI-specific but foundational for understanding why AI-generated disinformation is dangerous.

●ResearchIntermediateC10 · AI-Enabled DisinformationNEW · 22d ago

NIST AI 600-1 — AI RMF Generative AI ProfileNIST

RESEARCH

added 22d ago

Companion to AI RMF 1.0 specifically for generative AI. Maps 12 GenAI risks to RMF actions. Covers CBRN, CSAM, confabulation, data privacy, environmental, human-AI interaction, information integrity, IP, obscenity, toxicity, value chain.

●FrameworkIntermediateC5 · AI Red Teaming C2 · LLM-Specific Attacks C7 · AI Governance & RiskNEW · 22d ago

FRAMEWORK

added 22d ago

NIST AI RMF 1.0 + PlaybookNIST

(See cross-cutting.md for details.) The primary AI governance framework for US context. Questions should test practical application of Govern/Map/Measure/Manage, not just recall.

●FrameworkIntermediateC7 · AI Governance & RiskNEW · 22d ago

FRAMEWORK

added 22d ago

SBOM for AI/ML — AI BOM (Bill of Materials)NTIA / OpenSSF

Extending software bill of materials concepts to AI: model cards, data cards, training provenance. Emerging standard for AI supply chain transparency.

●FrameworkIntermediateC3 · AI Supply Chain SecurityNEW · 22d ago

NVIDIA — "AI Infrastructure Security Best Practices"NVIDIA

FRAMEWORK

added 22d ago

GPU cluster security, multi-tenant GPU isolation, model serving infrastructure hardening. Vendor-specific but covers unique infrastructure challenges (GPU memory isolation, CUDA vulnerabilities) not covered elsewhere.

●GuideIntermediateC6 · AI Infrastructure SecurityNEW · 22d ago

OpenAI — "Practices for Governing Agentic AI Systems" (2024)OpenAI

GUIDE

added 22d ago

Framework for agentic AI governance: scope control, human oversight, auditability, containment. Defines key properties agents should have and failure modes to prevent.

●ResearchIntermediateC11 · Agentic AI SecurityNEW · 22d ago

OpenAI — "Red Teaming Network" and GPT-4 System CardOpenAI

RESEARCH

added 22d ago

Description of external red teaming program and findings from GPT-4 pre-deployment testing. The system card details risk categories, testing methodology, and residual risks.

●ResearchIntermediateC5 · AI Red Teaming C8 · AI Safety & AlignmentNEW · 22d ago

RESEARCH

added 22d ago

OpenAI — "Weak-to-Strong Generalization" (2023)OpenAI

Research on the core alignment challenge: can weaker systems supervise stronger ones? Showed partial generalization is possible. Key for superalignment and scalable oversight questions.

●ResearchIntermediateC8 · AI Safety & AlignmentNEW · 22d ago

RESEARCH

added 22d ago

SLSA — Supply-chain Levels for Software ArtifactsOpenSSF

Framework for ensuring the integrity of software artifacts throughout the supply chain. Applicable to ML model pipelines.

●FrameworkIntermediateC3 · AI Supply Chain SecurityNEW · 1mo ago

OWASP — "Top 10 for LLM Applications: Agentic Applications" (2025 supplement)OWASP

FRAMEWORK

added 1mo ago

Extension of the LLM Top 10 specifically for agentic patterns. Covers excessive agency, insecure plugin/tool design, and multi-agent trust boundaries.

●ToolIntermediateC11 · Agentic AI Security C2 · LLM-Specific AttacksNEW · 22d ago

TOOL

added 22d ago

OWASP Agentic AI SecurityOWASP

OWASP guidance on securing agentic AI systems — tool use, delegation chains, memory poisoning, and multi-agent architectures.

●GuideIntermediateC11 · Agentic AI SecurityNEW · 1mo ago

GUIDE

added 1mo ago

OWASP Machine Learning Security Top 10OWASP

Top 10 security risks specific to machine learning systems, including supply chain attacks, data poisoning, and model theft.

●FrameworkIntermediateC1 · Adversarial Machine Learning C3 · AI Supply Chain SecurityNEW · 1mo ago

FRAMEWORK

added 1mo ago

Responsible AI Institute — RAI CertificationResponsible AI Institute

Certification program for responsible AI. Assessment criteria across fairness, explainability, accountability, robustness. Emerging industry certification.

●ResearchIntermediateC7 · AI Governance & RiskNEW · 22d ago

RESEARCH

added 22d ago

Stanford Internet ObservatoryStanford

Research group studying abuse in information technologies, including AI-enabled disinformation, platform manipulation, and election interference.

●GuideIntermediateC10 · AI-Enabled DisinformationNEW · 1mo ago

GUIDE

added 1mo ago

Stanford HAI — AI Index Report (Annual)Stanford Institute for Human-Centered AI

Comprehensive annual data on AI progress: research output, investment, policy, public opinion, technical performance. The best source for quantitative AI landscape questions.

●ResearchIntermediateC7 · AI Governance & RiskNEW · 22d ago

RESEARCH

added 22d ago

Trail of Bits — "AI/ML Security Auditing" researchTrail of Bits

Security audit firm with deep AI/ML expertise. Published research on pickle deserialization attacks, model file format security, and ML pipeline vulnerabilities. Technical depth from a security-first perspective.

●GuideIntermediateC6 · AI Infrastructure Security C3 · AI Supply Chain SecurityNEW · 22d ago

GUIDE

added 22d ago

FaceForensics++TU Munich

Large-scale benchmark dataset and tools for detecting facial manipulation in images and video. Used for deepfake detection research.

●ToolAdvancedC9 · Deepfakes & Synthetic MediaNEW · 1mo ago

TOOL

added 1mo ago

Biggio & Roli — "Wild Patterns: Ten Years After the Rise of Adversarial ML" (Pattern Recognition, 2018)University of Cagliari

Historical survey tracing adversarial ML from 2004 spam filters through deep learning. Essential for questions on the evolution and taxonomy of adversarial attacks (evasion, poisoning, model extraction).

●ResearchIntermediateC1 · Adversarial Machine LearningNEW · 22d ago

Carlini et al. — "Extracting Training Data from Diffusion Models" (USENIX Security 2023)Unknown

RESEARCH

added 22d ago

Extended training data extraction to image models. Showed Stable Diffusion memorizes and regurgitates training images. Important for multimodal AI data security questions.

●ResearchIntermediateC4 · AI Data SecurityNEW · 22d ago

Christiano et al. — "Deep Reinforcement Learning from Human Feedback"Unknown

RESEARCH

added 22d ago

The RLHF paper that enabled ChatGPT-style alignment. Reward model from human preferences + PPO. Foundational for understanding modern alignment approaches and their limitations.

●ResearchIntermediateC8 · AI Safety & AlignmentNEW · 22d ago

RESEARCH

added 22d ago

Mialon et al. — "Augmented Language Models: A Survey"Unknown

Survey of tool-using, retrieval-augmented, and reasoning LMs. The architectural foundation for understanding agent capabilities and their security implications.

●ResearchIntermediateC11 · Agentic AI SecurityNEW · 22d ago

Mirsky & Lee — "The Creation and Detection of Deepfakes: A Survey" (ACM Computing Surveys, 2021)Unknown

RESEARCH

added 22d ago

Comprehensive survey covering generation techniques (autoencoders, GANs, diffusion), detection approaches (visual artifacts, frequency analysis, physiological signals), and the arms race dynamic.

●ResearchIntermediateC9 · Deepfakes & Synthetic MediaNEW · 22d ago

RESEARCH

added 22d ago

Perez & Ribeiro — "Ignore This Title and HackAPrompt"Unknown

Largest prompt injection competition dataset. Taxonomy of prompt injection techniques: context ignoring, fake completion, payload splitting, obfuscation. Empirical data on attack success rates across models.

●ResearchIntermediateC2 · LLM-Specific AttacksNEW · 22d ago

Rossler et al. — "FaceForensics++: Learning to Detect Manipulated Facial Images"Unknown

RESEARCH

added 22d ago

Benchmark dataset and detection methods for facial manipulation. Covers DeepFakes, Face2Face, FaceSwap, NeuralTextures. Standard reference for deepfake detection evaluation.

●ResearchIntermediateC9 · Deepfakes & Synthetic MediaNEW · 22d ago

Ruan et al. — "Identifying the Risks of LM Agents with an LM-Emulated Sandbox" (2024)Unknown

RESEARCH

added 22d ago

ToolEmu framework for evaluating agent risks in sandboxed environments. 36 risk categories across tool use failures. Practical methodology for agent security testing questions.

●ResearchIntermediateC11 · Agentic AI SecurityNEW · 22d ago

Wei et al. — "Jailbroken: How Does LLM Safety Training Fail?"Unknown

RESEARCH

added 22d ago

Systematic analysis of jailbreak techniques: competing objectives and mismatched generalization. Framework for understanding why safety training is inherently incomplete. Essential for nuanced jailbreak questions.

●ResearchIntermediateC2 · LLM-Specific AttacksNEW · 22d ago