Pillar B: Applied AI in SecurityB4

AI in Offensive Security

AI-assisted pentesting, automated recon, AI-generated phishing/social engineering, deepfake attacks.

Part of Pillar B: Applied AI in Security · Applied AI in Security groups the disciplines that share methods, tools, and threat models with AI in Offensive Security.

What is AI in Offensive Security?

Attackers have access to the same models you do — and in 2026, sometimes better ones. The cost of producing a convincing spear-phish has dropped from hours of skilled writing to seconds of prompting; the cost of synthetic voice good enough to fool a help desk has dropped from a TV studio to a free Hugging Face model. And with restricted-release tools like Anthropic's Claude Mythos demonstrating autonomous discovery and exploitation of decades-old vulnerabilities (CVE-2026-4747 in FreeBSD NFS, 17 years undetected, 83.1% PoC reproduction first try), the *technical* end of the offensive AI capability curve has compressed timelines that defensive processes were never designed for.

The defensive question isn't 'is this happening' (it is, daily) but 'which of our controls were calibrated for the pre-AI cost curve.' Phishing training built around 'look for typos,' help-desk identity verification by voice, and IR playbooks that assume one targeted attack at a time were all reasonable answers in 2018 — and are reasonable to revisit in 2026.

This page is deliberately defensive in framing. It covers what attackers can do with AI well enough that defenders can model the threat, update their controls, and brief executives accurately. It does not cover offensive procedures — those belong in red team engagements with documented rules of engagement, not in an open editorial.

Why it matters

You can't defend against capabilities you don't believe exist. The most expensive failures of the last few years — MGM and Caesars (2023, vishing the help desk), the Arup Hong Kong deepfake CFO video fraud (2024, $25M wire), the Retool voice-clone of an executive (2023) — all involved capabilities defenders had been told about but hadn't internalized into process. The defensive failure was disbelief, not technology.

AI in offensive security is the adversary's evolving toolkit. Understanding it directly informs defensive strategy, red team scope, threat modeling, and the vulnerability management response (because Mythos-class autonomous discovery shortens patch windows). Defenders who skip this domain build threat models for the wrong decade.

Why this matters operationally

Two structural shifts matter most. First, the marginal cost of a high-quality social engineering attack has collapsed. Targeted, multilingual, context-aware lures that used to require skilled human attackers now require minutes of prompting. The implication: filter-based defenses help with volume, but the targeted attack at the top of the curve is more dangerous and more common.

Second, technical attack capabilities are scaling on a different curve. Tools like Claude Mythos demonstrate that AI can autonomously find and exploit complex vulnerabilities in code that survived decades of human review. Anthropic chose not to release Mythos publicly because the offensive capability outran defensive readiness — a reasonable decision and not a permanent one. Defenders should plan as if equivalent capabilities will be available to motivated adversaries within a 12–24 month horizon.

Where this shows up in practice

Targeted phishing at scale

A campaign generates a unique, contextually relevant lure for every recipient, pulled from public profiles, written in the recipient's native language with reference to recent meetings and projects. Click rates climb because nothing pattern-matches as a template. The defensive response isn't 'better filters' alone — it's process changes around the actions phishing tries to trigger (wire approvals, credential resets, code-signing requests).

Deepfake voice and video in vendor fraud (Retool 2023, Arup 2024)

In 2023, attackers used a cloned voice of a Retool executive in vishing of an IT employee, leading to a compromise that exposed customer data. In early 2024, attackers used a deepfake video conference call impersonating Arup's Hong Kong CFO to authorize a $25M wire transfer. Modern help desks now require callback to a known number; modern wire-approval flows require dual-channel confirmation regardless of urgency. Both incidents predate the worst of the 2025-2026 capability curve.

Help-desk vishing (MGM and Caesars, 2023)

Scattered Spider used a 10-minute social-engineering call against MGM's IT help desk to reset MFA on a privileged account, leading to a $100M+ outage. Caesars paid a reported $15M ransom from an adjacent attack. Neither attack required AI — but both illustrate the help-desk identity-verification weaknesses that AI-driven voice cloning makes dramatically worse.

AI-augmented reconnaissance

Public assets, certificate transparency logs, GitHub history, and breach data correlate automatically into a target dossier in minutes. Asset inventory hygiene becomes a defensive control, not a CMDB chore — what's discoverable about you is what attackers will use.

Mythos-class autonomous discovery (forward-looking)

Anthropic withheld Claude Mythos from public release after it autonomously identified CVE-2026-4747 (a 17-year-old FreeBSD NFS root RCE) and reproduced exploits with 83.1% first-try success. Defenders should assume equivalent capabilities will reach motivated adversaries — plan inventory, patch automation, and incident response with that assumption baked in. The B3 vulnerability management page covers the defensive operating model in detail.

Patch-diff exploitation acceleration

Attackers use models to summarize patch diffs published with security releases and propose hypotheses about what was fixed. The same capability is available to defenders; defenders are usually slower to deploy. N-day exploitation windows have shrunk meaningfully since 2023.

Common mistakes

Failure modes to watch for as you build the capability.

Assuming the lure will look obviously fake

Modern AI-generated phishing reads like competent corporate writing. Training on signal patterns (sender, link target, request type, urgency, novel payment instructions) ages better than training on prose quality.

Voice-only identity verification

If your help-desk reset flow accepts a voice as identity, you have a deepfake exposure. Add a callback to a known number or out-of-band step. MGM (2023) is the canonical lesson here, and it predates the worst of the AI capability curve.

Pre-AI red team scopes

Red teams that don't model AI-augmented recon and phishing miss a realistic attack path. Update the rules of engagement to permit it, with appropriate guardrails on harm and scope.

Defensive AI as a marketing checkbox

Vendors will sell you 'AI-powered detection of AI-generated content'; ask for the false-positive rate on your data and what it actually triggers on. Most are weaker than they sound, and the arms race favors generators over detectors.

Disbelieving the deepfake risk until it lands

Every deepfake-fraud post-mortem includes 'we knew this was possible but didn't think it would happen to us.' That's the disbelief failure mode, and it's the most consistent contributor to expensive incidents.

Key decisions and tradeoffs

Friction vs. usability

Every additional verification step on a help-desk call lowers vendor-fraud risk and raises legitimate-user friction. The right level is org-specific and worth measuring against actual attack data, not assumed user complaints.

Detection vs. acceptance

Reliably detecting AI-generated content is unsolved at scale. Accepting that some content will be synthetic and adjusting workflows around it is more durable than chasing detectors that the next generator iteration will defeat.

Awareness training cadence

Quarterly with topical refreshers tends to land better than annual modules. The technique evolution is now fast enough that annual training is calendar-aligned, not threat-aligned.

Disclosure of techniques

Detailed public discussion of offensive AI helps defenders calibrate but also helps attackers. Most credible research now favors high-level discussion with private disclosure of specifics — Anthropic's Mythos handling is a recent reference point worth studying.

Containment vs. disclosure timing

If a deepfake-driven incident lands, the temptation is to delay disclosure until investigation completes. Customers, partners, and regulators increasingly expect notification on the same hour, not the same week. Plan the disclosure path in advance.

What good looks like

Your phishing training reflects current AI-generated lures, not 2019 examples. Your help-desk identity verification doesn't accept voice as a single factor. Your wire-approval flow requires dual-channel confirmation regardless of urgency. Your red team has a documented stance on AI-augmented recon and phishing, and runs at least one engagement annually that exercises it. Executives have been briefed on the deepfake risk and the Mythos-class capability curve, and know what controls are in place. The MGM, Caesars, and Arup post-mortems are referenced in your control reviews, not just in vendor pitches.

Tools and platforms in this domain

MITRE ATLAS
Adversary tactics for AI/ML systems
Equivalent of ATT&CK, scoped to AI. Use for threat modeling AI-augmented adversary capabilities.
Atomic Red Team / Caldera
Open-source attack emulation (defensive testing)
Modern releases include AI-augmented technique emulation.
C2PA / Content Credentials
Provenance metadata standard for synthetic content
Useful as an integrity signal where present, not a detection mechanism.
KnowBe4 / Hoxhunt / Cofense
Phishing simulation
Worth asking vendors specifically about their AI-lure update cadence and how often the library reflects techniques observed in the last 90 days.
Reality Defender / Pindrop / Hive
Deepfake and voice-clone detection vendors
Evaluate carefully — claims vary widely and false-positive rates on real corporate audio are higher than marketed.
Project Glasswing partner programs (Anthropic)
Defensive use of restricted-release offensive AI capabilities
Currently limited to ~40 partner orgs. Worth tracking publicly even if you're not a partner.

Standards and frameworks

Curated resources

Authoritative sources we ground AI in Offensive Security questions in — frameworks, research, guides, and tools.

Black Hat / DEF CONguide

Black Hat / DEF CON Archives

Conference presentations covering novel attack techniques and defensive research. Essential for cutting-edge offensive/defensive questions. AI Village talks particularly relevant for Pillars B and C.

UIUCresearch

Fang et al. — "LLM Agents Can Autonomously Hack Websites" (2024)

Demonstrated GPT-4 exploiting real-world web vulnerabilities autonomously. 73% success rate on day-one CVEs. Key reference for questions about AI-augmented offensive capabilities and the asymmetry debate.

Unknownresearch

Harang & Ruef — "Securing LLMs Against Jailbreaking for Cybersecurity" (2024)

Analysis of how LLMs can be used for offensive security tasks and the implications for defensive guardrails. Covers the dual-use nature of security LLMs.

Bishop Foxguide

Bishop Fox — AI Red Teaming Research

Research on using AI for penetration testing automation: reconnaissance, vulnerability discovery, exploit generation. Practitioner perspective on what's practical vs. theoretical.

Anthropicresearch

Anthropic — "Responsible Scaling Policy" and capability evaluations

Evaluates model capabilities for autonomous cyber operations at each AI Safety Level (ASL). Defines thresholds where AI capability in offensive security requires additional safeguards. Key reference for responsible AI in offensive security.

Europolframework

Europol — "ChatGPT and the Impact of LLMs on Law Enforcement"

Law enforcement perspective on how LLMs enable cybercrime (phishing, malware, social engineering) and how AI assists threat intelligence and investigation.

PTESframework

PTES — Penetration Testing Execution Standard

Comprehensive standard for penetration testing methodology. Covers intelligence gathering, threat modeling, vulnerability analysis, exploitation, and reporting.

OWASPguide

OWASP Web Security Testing Guide

The most comprehensive open-source guide for web application security testing. Covers testing methodology, tools, and techniques.

OWASPguide

OWASP Testing Guide v4

Detailed testing techniques for identifying web vulnerabilities. Practical, hands-on approach to security assessment.

Red Canarytool

Atomic Red Team

Library of tests mapped to the MITRE ATT&CK framework. Small, portable detection tests for validating security controls.

MITREtool

Caldera — Automated Adversary Emulation

MITRE's automated adversary emulation platform. Runs pre-defined or custom attack sequences to test defenses.

MITREtool

MITRE ATT&CK Navigator

Web-based tool for annotating and exploring the ATT&CK matrix. Useful for threat modeling, gap analysis, and red team planning.

Certifications that signal this domain

Credentials whose blueprint meaningfully covers this domain. Core means centrally covered; also touched means present in the blueprint but not the primary focus.

Also touched

CRTEExpert·Altered SecurityOfficial page →

Certified Red Team Expert

Multi-forest AD compromise — cross-trust abuse, advanced delegation, and persistence in hardened enterprise environments.

CRTPProfessional·Altered SecurityOfficial page →

Certified Red Team Professional

Hands-on Active Directory attacker — Kerberos abuse, trust attacks, and lateral movement against a real multi-domain forest.

OSAIProfessional·OffSecOfficial page →

OffSec AI Security Practitioner

Offensive AI security — adversarial ML, LLM attacks, agent abuse.

Browse all certifications → — pick a cert on the interactive map to highlight every domain it covers.

Education and certifications

Adjacent concepts and related subdomains

The discipline that operationalizes resilience to social engineering. AI-generated content makes this more important, not less — and changes what 'good training' looks like.

The technical detection side of the same problem. Complementary to the awareness side — neither stands alone.

Where AI-augmented offensive techniques get exercised against your environment in a controlled way. Modern engagement scope honestly includes AI-augmented recon and social engineering.

The macro version of the same problem — synthetic content used to manipulate at scale, with cybersecurity implications when narratives drive insider behavior.

If Mythos-class autonomous discovery becomes broadly available, exploit timelines compress further. The defensive answer lives in B3 — better prioritization, AI-validated patches, faster remediation queues.

Explore next

A short, opinionated reading order from here.

More in Applied AI in Security

Practice B4 the way you'd be tested on it

329 questions available. Mixed-difficulty questions sourced from real practitioner scenarios.