AI in Offensive Security
AI-assisted pentesting, automated recon, AI-generated phishing/social engineering, deepfake attacks.
What is AI in Offensive Security?
Attackers have access to the same models you do — and in 2026, sometimes better ones. The cost of producing a convincing spear-phish has dropped from hours of skilled writing to seconds of prompting; the cost of synthetic voice good enough to fool a help desk has dropped from a TV studio to a free Hugging Face model. And with restricted-release tools like Anthropic's Claude Mythos demonstrating autonomous discovery and exploitation of decades-old vulnerabilities (CVE-2026-4747 in FreeBSD NFS, 17 years undetected, 83.1% PoC reproduction first try), the *technical* end of the offensive AI capability curve has compressed timelines that defensive processes were never designed for.
The defensive question isn't 'is this happening' (it is, daily) but 'which of our controls were calibrated for the pre-AI cost curve.' Phishing training built around 'look for typos,' help-desk identity verification by voice, and IR playbooks that assume one targeted attack at a time were all reasonable answers in 2018 — and are reasonable to revisit in 2026.
This page is deliberately defensive in framing. It covers what attackers can do with AI well enough that defenders can model the threat, update their controls, and brief executives accurately. It does not cover offensive procedures — those belong in red team engagements with documented rules of engagement, not in an open editorial.
Why it matters
You can't defend against capabilities you don't believe exist. The most expensive failures of the last few years — MGM and Caesars (2023, vishing the help desk), the Arup Hong Kong deepfake CFO video fraud (2024, $25M wire), the Retool voice-clone of an executive (2023) — all involved capabilities defenders had been told about but hadn't internalized into process. The defensive failure was disbelief, not technology.
AI in offensive security is the adversary's evolving toolkit. Understanding it directly informs defensive strategy, red team scope, threat modeling, and the vulnerability management response (because Mythos-class autonomous discovery shortens patch windows). Defenders who skip this domain build threat models for the wrong decade.
AI & Quantum Futures
The emerging stack reshaping cybersecurity from both directions — AI toolkit, AI attack surface, and the quantum transition.
Other domains in this layer
Why this matters operationally
Two structural shifts matter most. First, the marginal cost of a high-quality social engineering attack has collapsed. Targeted, multilingual, context-aware lures that used to require skilled human attackers now require minutes of prompting. The implication: filter-based defenses help with volume, but the targeted attack at the top of the curve is more dangerous and more common.
Second, technical attack capabilities are scaling on a different curve. Tools like Claude Mythos demonstrate that AI can autonomously find and exploit complex vulnerabilities in code that survived decades of human review. Anthropic chose not to release Mythos publicly because the offensive capability outran defensive readiness — a reasonable decision and not a permanent one. Defenders should plan as if equivalent capabilities will be available to motivated adversaries within a 12–24 month horizon.
Where this shows up in practice
A campaign generates a unique, contextually relevant lure for every recipient, pulled from public profiles, written in the recipient's native language with reference to recent meetings and projects. Click rates climb because nothing pattern-matches as a template. The defensive response isn't 'better filters' alone — it's process changes around the actions phishing tries to trigger (wire approvals, credential resets, code-signing requests).
In 2023, attackers used a cloned voice of a Retool executive in vishing of an IT employee, leading to a compromise that exposed customer data. In early 2024, attackers used a deepfake video conference call impersonating Arup's Hong Kong CFO to authorize a $25M wire transfer. Modern help desks now require callback to a known number; modern wire-approval flows require dual-channel confirmation regardless of urgency. Both incidents predate the worst of the 2025-2026 capability curve.
Scattered Spider used a 10-minute social-engineering call against MGM's IT help desk to reset MFA on a privileged account, leading to a $100M+ outage. Caesars paid a reported $15M ransom from an adjacent attack. Neither attack required AI — but both illustrate the help-desk identity-verification weaknesses that AI-driven voice cloning makes dramatically worse.
Public assets, certificate transparency logs, GitHub history, and breach data correlate automatically into a target dossier in minutes. Asset inventory hygiene becomes a defensive control, not a CMDB chore — what's discoverable about you is what attackers will use.
Anthropic withheld Claude Mythos from public release after it autonomously identified CVE-2026-4747 (a 17-year-old FreeBSD NFS root RCE) and reproduced exploits with 83.1% first-try success. Defenders should assume equivalent capabilities will reach motivated adversaries — plan inventory, patch automation, and incident response with that assumption baked in. The B3 vulnerability management page covers the defensive operating model in detail.
Attackers use models to summarize patch diffs published with security releases and propose hypotheses about what was fixed. The same capability is available to defenders; defenders are usually slower to deploy. N-day exploitation windows have shrunk meaningfully since 2023.
Common mistakes
Failure modes to watch for as you build the capability.
Modern AI-generated phishing reads like competent corporate writing. Training on signal patterns (sender, link target, request type, urgency, novel payment instructions) ages better than training on prose quality.
If your help-desk reset flow accepts a voice as identity, you have a deepfake exposure. Add a callback to a known number or out-of-band step. MGM (2023) is the canonical lesson here, and it predates the worst of the AI capability curve.
Red teams that don't model AI-augmented recon and phishing miss a realistic attack path. Update the rules of engagement to permit it, with appropriate guardrails on harm and scope.
Vendors will sell you 'AI-powered detection of AI-generated content'; ask for the false-positive rate on your data and what it actually triggers on. Most are weaker than they sound, and the arms race favors generators over detectors.
Every deepfake-fraud post-mortem includes 'we knew this was possible but didn't think it would happen to us.' That's the disbelief failure mode, and it's the most consistent contributor to expensive incidents.
Key decisions and tradeoffs
Every additional verification step on a help-desk call lowers vendor-fraud risk and raises legitimate-user friction. The right level is org-specific and worth measuring against actual attack data, not assumed user complaints.
Reliably detecting AI-generated content is unsolved at scale. Accepting that some content will be synthetic and adjusting workflows around it is more durable than chasing detectors that the next generator iteration will defeat.
Quarterly with topical refreshers tends to land better than annual modules. The technique evolution is now fast enough that annual training is calendar-aligned, not threat-aligned.
Detailed public discussion of offensive AI helps defenders calibrate but also helps attackers. Most credible research now favors high-level discussion with private disclosure of specifics — Anthropic's Mythos handling is a recent reference point worth studying.
If a deepfake-driven incident lands, the temptation is to delay disclosure until investigation completes. Customers, partners, and regulators increasingly expect notification on the same hour, not the same week. Plan the disclosure path in advance.
What good looks like
Your phishing training reflects current AI-generated lures, not 2019 examples. Your help-desk identity verification doesn't accept voice as a single factor. Your wire-approval flow requires dual-channel confirmation regardless of urgency. Your red team has a documented stance on AI-augmented recon and phishing, and runs at least one engagement annually that exercises it. Executives have been briefed on the deepfake risk and the Mythos-class capability curve, and know what controls are in place. The MGM, Caesars, and Arup post-mortems are referenced in your control reviews, not just in vendor pitches.
Tools and platforms in this domain
Standards and frameworks
Curated resources
Authoritative sources we ground AI in Offensive Security questions in — frameworks, research, guides, and tools.
Black Hat / DEF CON Archives
Conference presentations covering novel attack techniques and defensive research. Essential for cutting-edge offensive/defensive questions. AI Village talks particularly relevant for Pillars B and C.
Fang et al. — "LLM Agents Can Autonomously Hack Websites" (2024)
Demonstrated GPT-4 exploiting real-world web vulnerabilities autonomously. 73% success rate on day-one CVEs. Key reference for questions about AI-augmented offensive capabilities and the asymmetry debate.
Harang & Ruef — "Securing LLMs Against Jailbreaking for Cybersecurity" (2024)
Analysis of how LLMs can be used for offensive security tasks and the implications for defensive guardrails. Covers the dual-use nature of security LLMs.
Bishop Fox — AI Red Teaming Research
Research on using AI for penetration testing automation: reconnaissance, vulnerability discovery, exploit generation. Practitioner perspective on what's practical vs. theoretical.
Anthropic — "Responsible Scaling Policy" and capability evaluations
Evaluates model capabilities for autonomous cyber operations at each AI Safety Level (ASL). Defines thresholds where AI capability in offensive security requires additional safeguards. Key reference for responsible AI in offensive security.
Europol — "ChatGPT and the Impact of LLMs on Law Enforcement"
Law enforcement perspective on how LLMs enable cybercrime (phishing, malware, social engineering) and how AI assists threat intelligence and investigation.
PTES — Penetration Testing Execution Standard
Comprehensive standard for penetration testing methodology. Covers intelligence gathering, threat modeling, vulnerability analysis, exploitation, and reporting.
OWASP Web Security Testing Guide
The most comprehensive open-source guide for web application security testing. Covers testing methodology, tools, and techniques.
OWASP Testing Guide v4
Detailed testing techniques for identifying web vulnerabilities. Practical, hands-on approach to security assessment.
Atomic Red Team
Library of tests mapped to the MITRE ATT&CK framework. Small, portable detection tests for validating security controls.
Caldera — Automated Adversary Emulation
MITRE's automated adversary emulation platform. Runs pre-defined or custom attack sequences to test defenses.
MITRE ATT&CK Navigator
Web-based tool for annotating and exploring the ATT&CK matrix. Useful for threat modeling, gap analysis, and red team planning.
Certifications that signal this domain
Credentials whose blueprint meaningfully covers this domain. Core means centrally covered; also touched means present in the blueprint but not the primary focus.
Also touched
Certified Red Team Expert
Multi-forest AD compromise — cross-trust abuse, advanced delegation, and persistence in hardened enterprise environments.
Certified Red Team Professional
Hands-on Active Directory attacker — Kerberos abuse, trust attacks, and lateral movement against a real multi-domain forest.
OffSec AI Security Practitioner
Offensive AI security — adversarial ML, LLM attacks, agent abuse.
Browse all certifications → — pick a cert on the interactive map to highlight every domain it covers.
Education and certifications
Adjacent concepts and related subdomains
The discipline that operationalizes resilience to social engineering. AI-generated content makes this more important, not less — and changes what 'good training' looks like.
The technical detection side of the same problem. Complementary to the awareness side — neither stands alone.
Where AI-augmented offensive techniques get exercised against your environment in a controlled way. Modern engagement scope honestly includes AI-augmented recon and social engineering.
The macro version of the same problem — synthetic content used to manipulate at scale, with cybersecurity implications when narratives drive insider behavior.
If Mythos-class autonomous discovery becomes broadly available, exploit timelines compress further. The defensive answer lives in B3 — better prioritization, AI-validated patches, faster remediation queues.
Explore next
A short, opinionated reading order from here.
Security Awareness & Human Factors
Phishing simulation, security culture measurement, behavioral psychology, insider threat programs, social engineering defense training.
C9Deepfakes & Synthetic Media
Deepfake detection, synthetic voice/video attacks, identity verification bypass, C2PA standards.
B3AI for Vulnerability Management
AI-assisted code review, predictive vulnerability prioritization (EPSS), automated patch assessment.
More in Applied AI in Security
Practice B4 the way you'd be tested on it
329 questions available. Mixed-difficulty questions sourced from real practitioner scenarios.