›Source library · 320 curated entries

Where every claim in SecProve
comes from.

A dense reading catalog. Every claim is footnoted. Sort by source, filter by pillar, type, or recency. Built for analysts who want to see what we are standing on.

320SOURCES

143ORGS

50DOMAINS

320ADDED · 90 DAYS

Pillar · multi-selectall 4 selected

Domainsselect pillar(s) above

Browsing the full corpus. Pick pillars above to narrow to specific domains.

/

15 sources · matching filters · sorted by citation density

Sort

BApplied AI in Security3 sources

Anthropic — "Responsible Scaling Policy" and capability evaluationsAnthropic

Evaluates model capabilities for autonomous cyber operations at each AI Safety Level (ASL). Defines thresholds where AI capability in offensive security requires additional safeguards. Key reference for responsible AI in offensive security.

●ResearchIntermediateB4 · AI in Offensive Security C5 · AI Red Teaming C8 · AI Safety & AlignmentNEW · 22d ago

Test your knowledge · B4

RESEARCH

added 22d ago

NeMo GuardrailsNVIDIA

NVIDIA's open-source toolkit for adding programmable guardrails to LLM applications. Supports input/output validation and topic control.

●ToolIntermediateB7 · AI Security Tool Landscape C8 · AI Safety & AlignmentNEW · 1mo ago

Test your knowledge · B7

TOOL

added 1mo ago

OWASP AI Security and Privacy GuideOWASP

Comprehensive guide covering AI security threats, privacy risks, and practical controls for AI-powered applications.

●GuideIntermediateB7 · AI Security Tool Landscape C8 · AI Safety & AlignmentNEW · 1mo ago

Test your knowledge · B7

GUIDE

added 1mo ago

CCybersecurity of AI Systems12 sources

Anthropic Research IndexAnthropic

Collection of Anthropic's published research on AI safety, alignment, interpretability, and security.

●GuideIntermediateC8 · AI Safety & Alignment C2 · LLM-Specific AttacksNEW · 1mo ago

Test your knowledge · C8

cites

GUIDE

added 1mo ago

Amodei et al. — "Concrete Problems in AI Safety" (2016)Google Brain / OpenAI

Five practical safety problems: avoiding side effects, reward hacking, scalable oversight, safe exploration, distributional shift. Still the canonical taxonomy for AI safety research questions.

●ResearchIntermediateC8 · AI Safety & AlignmentNEW · 22d ago

Test your knowledge · C8

cites

RESEARCH

added 22d ago

TruthfulQA BenchmarkAcademic

Benchmark measuring whether language models generate truthful answers. Tests for common misconceptions and falsehoods.

●ToolAdvancedC8 · AI Safety & AlignmentNEW · 1mo ago

Test your knowledge · C8

TOOL

added 1mo ago

Anthropic Responsible Scaling PolicyAnthropic

Anthropic's framework for responsible AI development. Defines AI Safety Levels (ASL) and capability thresholds.

●GuideIntermediateC8 · AI Safety & AlignmentNEW · 1mo ago

Test your knowledge · C8

GUIDE

added 1mo ago

Constitutional AI: Harmlessness from AI Feedback (Bai et al. 2022)Anthropic

Anthropic's approach to AI alignment using a set of principles (a "constitution") to train helpful and harmless AI. Foundation of modern RLHF alternatives.

●ResearchIntermediateC8 · AI Safety & AlignmentNEW · 1mo ago

Test your knowledge · C8

RESEARCH

added 1mo ago

Center for AI Safety (CAIS) — "An Overview of Catastrophic AI Risks"CAIS

Comprehensive taxonomy of AI risks: weaponization, misinformation, power concentration, value lock-in, rogue AI. Good for strategic-level safety questions beyond technical alignment.

●ResearchIntermediateC8 · AI Safety & AlignmentNEW · 22d ago

Test your knowledge · C8

RESEARCH

added 22d ago

Google Secure AI Framework (SAIF)Google

Google's conceptual framework for securing AI systems. Covers supply chain, data governance, and deployment security.

●FrameworkIntermediateC7 · AI Governance & Risk C8 · AI Safety & AlignmentNEW · 1mo ago

Test your knowledge · C7

FRAMEWORK

added 1mo ago

DeepMind — "Scalable Agent Alignment via Reward Modeling" and safety researchGoogle DeepMind

Research on reward modeling, debate, recursive reward modeling, and interpretability. Provides an alternative perspective to Anthropic/OpenAI approaches.

●ResearchIntermediateC8 · AI Safety & AlignmentNEW · 22d ago

Test your knowledge · C8

RESEARCH

added 22d ago

Google DeepMind — "Evaluating Frontier Models for Dangerous Capabilities" (2023)Google DeepMind

Framework for evaluating dangerous capabilities: persuasion, deception, cyber operations, self-replication. Defines evaluation methodology for frontier model safety. Questions on what to test and how to interpret results.

●ResearchIntermediateC5 · AI Red Teaming C8 · AI Safety & AlignmentNEW · 22d ago

Test your knowledge · C5

RESEARCH

added 22d ago

OpenAI — "Red Teaming Network" and GPT-4 System CardOpenAI

Description of external red teaming program and findings from GPT-4 pre-deployment testing. The system card details risk categories, testing methodology, and residual risks.

●ResearchIntermediateC5 · AI Red Teaming C8 · AI Safety & AlignmentNEW · 22d ago

Test your knowledge · C5

RESEARCH

added 22d ago

OpenAI — "Weak-to-Strong Generalization" (2023)OpenAI

Research on the core alignment challenge: can weaker systems supervise stronger ones? Showed partial generalization is possible. Key for superalignment and scalable oversight questions.

●ResearchIntermediateC8 · AI Safety & AlignmentNEW · 22d ago

Test your knowledge · C8

RESEARCH

added 22d ago

Christiano et al. — "Deep Reinforcement Learning from Human Feedback"Unknown

The RLHF paper that enabled ChatGPT-style alignment. Reward model from human preferences + PPO. Foundational for understanding modern alignment approaches and their limitations.

●ResearchIntermediateC8 · AI Safety & AlignmentNEW · 22d ago

Test your knowledge · C8

RESEARCH

added 22d ago

Ready to test what you've learned?

Our questions are built directly from these resources. Take a quiz and see how your knowledge stacks up.

Start a Quiz Try Speed Run