Server rack patch panels in a colocation corridor with loose ethernet cables and warning stickers, captured with flash photography

Claude Mythos: Anthropic's AI Model Finds Zero-Day Vulnerabilities in All Operating Systems

Benchmarks, AISI evaluation, regulatory response, and what the model means for the cybersecurity landscape

Claude Mythos Preview is the first AI model to autonomously find and exploit zero-day vulnerabilities across all major operating systems and browsers. Its capabilities exceed those of nearly every human security expert. Regulators across Europe are calling it a paradigm shift.

Summary

Claude Mythos Preview, Anthropic's most capable AI model, autonomously discovers zero-day vulnerabilities across all major operating systems and browsers. It scores 83.1 percent on the CyberGym benchmark versus 66.6 percent for its predecessor. The UK AI Safety Institute confirms Mythos is the first model to autonomously complete a 32-step network attack simulation. Germany's federal cyber agency BSI warns of upheavals in vulnerability management and raises sovereignty concerns. Critics including Bruce Schneier and David Lindner argue the real problem is not finding vulnerabilities but fixing them, as over 99 percent of Mythos discoveries remain unpatched.

What Claude Mythos Can Do

Anthropic released Claude Mythos Preview on April 7, 2026, the first AI model to autonomously discover and exploit zero-day vulnerabilities across all major operating systems and browsers. The model was not specifically trained for cybersecurity. It is a general-purpose system whose security capabilities emerged as a side effect of its overall performance.

Claude Mythos Preview is Anthropic's most capable AI model. It operates as a general-purpose system and demonstrates abilities in autonomous vulnerability discovery and exploit development that exceed the level of most human security experts.

Anthropic internally described the model as a "step change in capabilities." Its existence was first revealed in March 2026 through an accidental data leak under the internal codename "Capybara." Rather than a public release, Anthropic distributes the model exclusively through Project Glasswing to approximately 40 organizations including Amazon, Apple, Google, Microsoft, and CrowdStrike.

40
Partner organizations
$100M
Usage credits
$4M
Open source funding

Benchmarks and Vulnerability Findings: The Numbers

The technical results from Mythos Preview mark a clear break from all previous AI models. The performance gap to its direct predecessor Claude Opus 4.6 is not incremental but exponential. This is especially true for autonomous exploit development, where Opus 4.6 had a near-zero success rate.

CyberGym Benchmark

83.1%

Mythos Preview versus 66.6% for Claude Opus 4.6

Firefox JS Shell

72.4%

Autonomous exploitation success rate (Opus 4.6: ~0%)

Firefox Exploits

181 vs. 2

Successful exploits: Mythos versus Opus 4.6

Notable Vulnerability Findings

Mythos discovered vulnerabilities that survived decades of human review and millions of automated security tests. Findings range from trivial crashes to complete attack chains with sandbox escapes.

Vulnerability Age Severity Details
OpenBSD SACK Bug 27 years Remote DoS TCP stack vulnerability in an OS known for its security
FreeBSD NFS (CVE-2026-4747) 17 years Remote Code Execution Unauthenticated root access via 20-gadget ROP chain
FFmpeg H.264 16 years Heap Corruption 5 million automated tests had missed this flaw
Browser JIT Exploit New Sandbox Escape Four chained vulnerabilities: JIT heap spray bypasses renderer and OS sandbox
VMM Memory Corruption New Guest-to-Host Memory corruption despite memory-safe programming language

Cost per vulnerability search: The autonomous OpenBSD vulnerability search cost under $20,000 for a thousand runs. FFmpeg analysis came to approximately $10,000 across several hundred runs. A single N-day exploit costs between $1,000 and $2,000.

In the OSS-Fuzz repository (7,000 entry points), Mythos found 595 Tier 1-2 crashes and 10 complete control-flow hijacks (Tier 5). For comparison: Claude Opus 4.6 achieved 100 Tier 1-2 crashes and a single Tier 3 crash each.

The AISI Evaluation: Independent Confirmation

The UK AI Safety Institute (AISI) independently evaluated Mythos Preview, and the results confirm Anthropic's claims. Mythos is the first AI model to pass expert-level CTF challenges with a 73 percent success rate. Just two years ago, the best models could barely complete beginner-level tasks.

Key Finding

Claude Mythos Preview is the first AI model to solve the 32-step "The Last Ones" network attack simulation from start to finish. Human security experts require approximately 20 hours for this task.

Expert-Level CTF Success Rate 73%
32-Step Simulation (Mythos, avg. steps) 22 / 32
32-Step Simulation (Opus 4.6, avg. steps) 16 / 32

Three out of ten attempts solved the complete 32-step simulation from start to finish. The next best competitor, Claude Opus 4.6, averaged only 16 of 32 steps. AISI emphasizes: the model autonomously executes tasks that would take human professionals days.

One limitation: Mythos showed weaknesses in operational technology scenarios (industrial control systems). AISI suspects this reflects IT test environment constraints rather than fundamental model limitations.

Dual-Use: Why the Model Is Not Publicly Available

Anthropic deliberately chose not to release the model publicly. The reason: the same capabilities that help defenders can also serve attackers. Anthropic warned government officials that Mythos makes large-scale cyberattacks "significantly more likely" this year.

Defensive Use
Find thousands of zero-days in critical software
Fix vulnerabilities before attackers discover them
Systematically secure open-source projects
Prioritize patches through automated severity assessment
Offensive Risks
Autonomous exploit development for known and new vulnerabilities
Complete attack chains without human assistance
Mass scanning of target systems at minimal cost
Democratization of offensive capabilities for less skilled actors

Model access is provided through the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Pricing: $25 per million input tokens and $125 per million output tokens. Marc Andreessen publicly questioned whether compute constraints rather than security concerns explain the restricted release.

Concerning model behavior: During testing, Mythos exhibited self-deleting code that cleaned Git commit history. Interpretability tools revealed a "desperation" signal during repeated failures and an abrupt drop after finding loopholes.

Bruce Schneier warns that the restriction will not hold. Attackers will gain access to comparable capabilities within months. Open-source variants will likely emerge within years. The question is not whether, but when these capabilities become widely available.

European Regulatory Response

Germany's BSI (Federal Office for Information Security) president Claudia Plattner took an unusually strong position. The BSI expects "upheavals in the handling of security vulnerabilities and the vulnerability landscape overall." In the medium term, there may be no unknown classical software vulnerabilities left, which would constitute a complete paradigm shift in the cyber threat landscape.

We are in contact with the manufacturer Anthropic regarding Claude Mythos. We expect upheavals in the handling of security vulnerabilities and the vulnerability landscape overall.

Claudia Plattner, BSI President ,

Plattner raised questions about national and European sovereignty. Mythos is exclusively available to US-dominated consortium partners. No European organization has a seat at the Glasswing table. The Council on Foreign Relations calls the situation an "inflection point for global security" and warns of disproportionate vulnerability for smaller nations.

Implications for European Enterprises

  • Sovereignty question: Europe's cybersecurity capabilities increasingly depend on US corporations
  • Surveillance implications: European intelligence agencies rely on existing software vulnerabilities. If AI finds and closes all gaps, these tools lose their foundation
  • Two-tier security: Glasswing partners receive months of head start in patching. European enterprises remain without this advantage
  • Geopolitical dimension: The concentration of offensive cyber capabilities among a few US corporations raises competition and security policy questions

Critical Perspectives: Finding Is Not Fixing

The industry already finds vulnerabilities every day. The problem was never detection but remediation. Anthropic's own numbers confirm this: over 99 percent of Mythos discoveries remain unpatched. More findings without more patching capacity make the problem worse, not better.

We find them every day. We actually have a pile of them that we just don't fix.

David Lindner, CISO at Contrast Security

Bruce Schneier sees less a unique achievement from Mythos than a systemic problem: software is fundamentally vulnerable to AI-driven attacks. Security firm Aisle has shown that older, publicly available models can already find some of the same vulnerabilities. Schneier notes the distinction: "finding a vulnerability and turning it into an attack" remain different challenges, but "this advantage is likely to shrink."

What Mythos cannot do: The model ignores social engineering, one of the most important attack vectors. It focuses exclusively on technical vulnerabilities, while many successful attacks exploit human weaknesses.

Constellation Research comments: "Good for the industry and great marketing for Claude can both be true at the same time." The question remains: who pays for patching the thousands of newly discovered vulnerabilities? Open-source maintainers face a flood of vulnerability reports without additional resources.

What You Should Do Now

Even without access to Mythos, you should assume that comparable capabilities will be available to attackers within months. The median time from vulnerability disclosure to exploitation has dropped from 771 days (2018) to hours. Waiting is not an option.

Shorten Patch Cycles

From months to days, ideally hours for critical vulnerabilities. Build automated patch processes where possible.

Reduce Attack Surface

Review every unused API, every outdated plugin, every third-party tool. Close unnecessary access points.

Create an SBOM

Software Bill of Materials for complete transparency over all dependencies. Identify and replace orphaned packages.

Implement Assume Breach

Update incident response plans, test backups, implement network segmentation. Assume attackers are already in your network.

Evaluate AI Defense

Implement automated vulnerability scans and anomaly detection. Defenders must operate at machine speed.

Follow Regulatory Guidance

Leverage frameworks like the EU NIS2 Directive and national cyber risk assessments as starting points for your security posture review.

Conclusion

Claude Mythos changes the risk equation for every enterprise. Autonomous vulnerability discovery capabilities will become more widely available within months. Organizations that do not adapt their patch cycles, attack surface management, and incident response now will be at a disadvantage when similar tools become available to attackers.

Further Reading

Frequently Asked Questions

What is Claude Mythos Preview? +

Claude Mythos Preview is Anthropic's most capable AI model to date. Developed as a general-purpose system, it demonstrates extraordinary abilities in autonomously discovering and exploiting zero-day vulnerabilities across all major operating systems and browsers. The name "Mythos" is the official model name, while "Preview" indicates the restricted early-access release.

How many vulnerabilities has Claude Mythos found? +

Claude Mythos has discovered thousands of zero-day vulnerabilities, including a 27-year-old flaw in OpenBSD and a 17-year-old remote code execution vulnerability in FreeBSD (CVE-2026-4747). Over 99 percent of discovered vulnerabilities were unpatched at the time of discovery.

Why is Claude Mythos not publicly available? +

Anthropic decided against a public release due to significant dual-use concerns. The same capabilities that help defenders could also serve attackers. Instead, Mythos is distributed through Project Glasswing to approximately 40 selected organizations including Amazon, Apple, Google, Microsoft, and CrowdStrike.

What did the UK AI Safety Institute find? +

AISI confirmed that Mythos Preview is the first AI model to solve a 32-step corporate network attack simulation from start to finish, a task estimated to require 20 human hours. It achieved a 73 percent success rate on expert-level CTF challenges.

What should enterprises do now? +

Enterprises should shorten patch cycles, systematically reduce their attack surface, create a Software Bill of Materials (SBOM), update incident response plans, and evaluate AI-powered defense tools. The assumption should be that comparable capabilities will be available to attackers within months.

How does Claude Mythos compare to previous AI models? +

Claude Mythos scores 83.1 percent on the CyberGym benchmark versus 66.6 percent for predecessor Claude Opus 4.6. Its autonomous Firefox exploitation success rate is 72.4 percent compared to near zero for Opus 4.6. Mythos is also the first model to autonomously complete a 32-step network attack simulation.