Pillar B: Applied AI in SecurityB3

AI for Vulnerability Management

AI-assisted code review, predictive vulnerability prioritization (EPSS), automated patch assessment.

Part of Pillar B: Applied AI in Security · Applied AI in Security groups the disciplines that share methods, tools, and threat models with AI for Vulnerability Management.

What is AI for Vulnerability Management?

Vulnerability management changed in early 2026. Until then, the discipline was about ranking a 50,000-CVE backlog and hoping you patched the right two percent. AI changed both ends of that pipeline at once — the *finding* of vulnerabilities and the *prioritization* of them.

On the discovery side, OpenAI Codex Security (March 2026 research preview) scanned 1.2 million commits in its first 30 days and surfaced 792 critical and 10,561 high-severity findings, with false-positive rates more than 50% below traditional SAST. Anthropic's Claude Mythos, restricted to ~40 partner organizations under Project Glasswing, autonomously discovered a 17-year-old root RCE in FreeBSD's NFS implementation (CVE-2026-4747) and reproduces vulnerabilities into working exploits 83.1% of the time on first try. On the prioritization side, EPSS, CISA KEV, exploit-observation feeds, and reachability analysis have matured into a defensible operating model that cuts effective remediation volume by an order of magnitude.

The operational question for security teams is no longer 'can we find more vulnerabilities' (we can — far too many) but 'can we route, validate, and patch the ones that actually matter, while AI-augmented adversaries are running the same scan against our externally-visible surface.'

Why it matters

You can't out-patch the scanner. The combination of AI-driven discovery and AI-augmented exploitation has compressed the window between 'vulnerability exists' and 'vulnerability is being exploited' to days or hours for the highest-value bugs. Modern VM is about closing that window for the bugs that matter — and explicitly accepting risk on the rest.

AI for vulnerability management connects asset inventory, threat intelligence, code-level discovery, and remediation workflows into a single prioritization layer. It only works when the operating model — ownership, SLAs, exception handling — is as mature as the tooling.

Why this matters operationally

Two things broke the old vulnerability management model in 2026. First, AI-driven scanners — Codex Security, Mythos and its peers, the LLM-augmented features in Snyk, Semgrep, Endor Labs and GitHub Advanced Security — produce volumes of validated, high-severity findings that no human triage queue can absorb. Second, AI-augmented offensive operations close exploitation timelines that the patch-cycle process was never designed for.

That puts the discipline in an uncomfortable spot: the only durable answer is ruthless prioritization, automation of the patch path itself, and explicit acceptance of risk on the long tail. If your team is still measuring success by total vulnerabilities closed, you're optimizing for the wrong number — and probably exhausting your engineers in the process.

Where this shows up in practice

Codex Security finds a real bug in your repo

A weekly Codex Security run on your monorepo identifies a session-handling flaw in your auth service. Because Codex Security validates each finding by attempting reproduction before flagging, the PR it opens is paired with a working PoC and a draft patch. Your engineer's job is review, not triage. OpenAI's first 30-day cohort surfaced 792 critical and 10,561 high-severity findings across 1.2M commits with >50% lower false-positive rate than rule-based SAST.

Mythos discovers a 17-year-old NFS RCE

In April 2026, Anthropic's Claude Mythos autonomously identified CVE-2026-4747 — a remote root RCE in FreeBSD's NFS server, reachable from any unauthenticated network position, that had survived 17 years of human review. Mythos reproduces vulnerabilities into working exploits 83.1% of the time on first try. Defenders learned that age of code is not evidence of safety; attackers learned the same thing. Anthropic withheld broad release because the offensive capability outran defensive readiness.

Chained low-severity exploits (MOVEit lesson)

MOVEit Transfer (CVE-2023-34362, May 2023) was the textbook case: a path traversal, a SQL injection, and an insecure deserialization — none individually catastrophic, none triaged urgently in isolation — chained by Cl0p into a global mass-exfiltration event affecting 2,000+ organizations. Modern AI scanners are starting to surface chains, not just individual bugs; modern prioritization needs to score chains as units, not sum CVSS.

Composability failure (the CGC lesson)

Two services pass independent security review. Each is correct in isolation. A trust assumption in one (a header it accepts as authenticated) doesn't match a trust assumption in the other (a header it sets but doesn't sign). The integration ships, the bug exists, no individual scan finds it. This is the lesson Saltzer & Schroeder articulated in 1975, that DARPA's 2016 Cyber Grand Challenge re-validated when automated patching introduced new flaws, and that AI-driven cross-service scanners are now finding at scale.

EPSS-driven SLA tiers

Tier-1 SLA (7 days) for CVSS≥7 AND EPSS≥0.7 AND on KEV. Tier-2 (30 days) for everything else above CVSS 7. The backlog quietly shrinks because the team only owes promises on what's actually likely to be exploited — and the prioritization is defensible to auditors and engineering leadership.

Key decisions and tradeoffs

Precision vs. coverage

Aggressive deprioritization (only patch EPSS>0.5) cuts work ~95% but accepts risk on the long tail when EPSS misses. Most teams stay conservative until the data proves itself in their environment.

Centralized scoring vs. team-owned context

A central VM team computes EPSS+CVSS; only the application team knows whether a code path is reachable in their service. Both layers are required and they fail differently when one is missing.

AI scanner cost vs. cadence

Running deep AI-driven scans (Codex Security, Mythos-class tools where available) on every PR is expensive; nightly is more affordable; weekly leaves a longer attacker window. Pick the budget conversation explicitly.

Patch automation vs. blast radius

Auto-patching is the dream and a real risk. Phased rollouts, canaries, and rollback automation matter more than the patch tool itself — and matter more for AI-generated patches than for vendor-supplied ones.

Disclosure of AI-found bugs

Anthropic withheld Mythos broad release because offensive capability outran defensive readiness. That tradeoff plays out across the ecosystem; defenders depend on coordinated disclosure that's harder to coordinate at machine speed.

Tools and platforms in this domain

Anthropic Claude Mythos (Project Glasswing)
AI-driven autonomous vulnerability discovery
Restricted preview to ~40 partner orgs. 83.1% first-try PoC reproduction; found CVE-2026-4747 in FreeBSD NFS. Anthropic withheld broad release pending defensive readiness.
OpenAI Codex Security
AI-driven code scanning with validated findings + patch PRs
March 2026 research preview. 1.2M commits scanned, 792 critical + 10,561 high-severity found in 30 days, >50% FP reduction vs SAST. Deliberately not a SAST report.
GitHub Advanced Security (Copilot Autofix)
Code scanning with LLM-assisted triage and auto-fixes
Lower-friction entry point than the standalone AI scanners; integrates into PR workflow.
Snyk / Semgrep / Endor Labs
SCA + SAST with reachability analysis
Endor Labs in particular is built around the reachability-cuts-volume premise; 60-90% volume reductions are common.
Tenable / Qualys / Rapid7
Vulnerability scanners
Built-in EPSS and KEV enrichment in modern releases. Still the operational backbone of most enterprise VM programs.
VulnCheck
Exploit intelligence
KEV plus private exploit-observation feeds. The 'what's actually being weaponized today' layer above EPSS.
Wiz / Orca
Cloud VM with attack-path context
Connects vulnerabilities to reachability across cloud assets, IAM, and exposed services — composability failures get visible here.
Vicarius / Kenna
Risk-based VM platforms
Operationalize EPSS+KEV+asset context into a single ticket queue with SLA tiering.

Standards and frameworks

Signals this skill matters in hiring

Modern VM and AppSec interviews probe for prioritization reasoning ('You have 50,000 open vulns and capacity for 200 — show your math'), familiarity with the AI-scanner landscape (Codex Security, Mythos and its successors, Copilot Autofix), and the ability to explain composition failures with a real example. Bonus points for being able to articulate why Codex Security argues SAST is the wrong unit of analysis, or why Anthropic restricted Mythos.

Roles where this matters

Career paths where this domain shows up as core or recommended.

🏗Security EngineerRecommended

Design, build, and maintain security infrastructure. The architects of an organization's defensive posture.

💻AppSec / DevSecOps EngineerCore

Embed security into the software development lifecycle. Shift left to catch vulnerabilities before they reach production.

🐛Vulnerability Management LeadCore

Owns the end-to-end find → prioritize → fix → verify loop at scale, now increasingly AI-driven.

🌐Threat Exposure Management / Attack Surface AnalystCore

External-first role: inventories what an attacker can see, tracks what's new, and drives closure through the org. The outside-in counterpart to vuln management.

People shaping this field

Researchers and practitioners worth following in this space.

Co-creator of EPSS, data scientist at Cyentia Institute

Co-creator of EPSS, researcher at RAND Corporation

Co-founder of Veracode, application security pioneer

Curated resources

Authoritative sources we ground AI for Vulnerability Management questions in — frameworks, research, guides, and tools.

Adjacent concepts and related subdomains

Where most modern vulnerabilities live. Prioritization is theory until app teams patch — AppSec is the operating muscle that turns a queue into closed tickets.

Exploit-observed signals come from threat intel. Without it, EPSS is just a generic prior; with it, the score becomes specific to your moment in time.

Most CVE volume is in transitive dependencies. SBOMs and reachability analysis are how you make that volume tractable rather than unbounded.

When you can't patch, you compensate with detection. Detection engineers own the gap between 'known vulnerable' and 'fixed.'

Composability and integration risk

A foundational concept since Saltzer & Schroeder (1975), re-validated by DARPA's 2016 Cyber Grand Challenge, and now visible at scale via AI scanners that cross service boundaries. Two independently-secure systems can compose into an insecure one — and the failure mode is almost always at the trust boundary.

Explore next

A short, opinionated reading order from here.

More in Applied AI in Security

Practice B3 the way you'd be tested on it

334 questions available. Mixed-difficulty questions sourced from real practitioner scenarios.