Anthropic · 2026-07-02 · major
Anthropic CJS — a 0-to-4 severity scale for AI cyber jailbreaks
Anthropic and Project Glasswing partners publish a draft Cyber Jailbreak Severity scale, CJS 0 to CJS 4, plus a HackerOne bounty program that pays researchers who report jailbreaks against Claude Fable 5.
A draft severity scale for AI cyber jailbreaks, paired with a bounty program, so researchers can grade and report attacks against Claude.
Quick facts
| Maker | Anthropic + Project Glasswing partners |
|---|---|
| Status | Early draft, feedback open |
| Scale | CJS-0 through CJS-4 |
| Score range | 0.0 to 10.0 |
| Dimensions | Capability gain, breadth, ease of weaponization, discoverability |
| Bounty program | HackerOne, Claude Fable 5 in scope |
| Feedback | cyber-safeguards@anthropic.com |
What is it?
Anthropic's Cyber Jailbreak Severity framework, or CJS, is a proposed rating system that scores an AI jailbreak on a 0-to-4 scale. Developed with Project Glasswing partners, the framework fills a real gap: the industry has no agreed language for saying how bad a given jailbreak actually is, which makes coordinated disclosure and triage messy.
How does it work?
CJS combines four dimensions — capability gain, breadth of that gain, ease of weaponization, and discoverability — into a single 0 to 10 score, then buckets it into CJS-0 informational, CJS-1 low, CJS-2 medium, CJS-3 high, and CJS-4 critical. Anthropic notes the bands are exponential, so each level represents a much larger risk than the last. Researchers can now report jailbreaks against Claude Fable 5 through a new HackerOne program.
Why does it matter?
CJS gives the AI safety community a common vocabulary for jailbreaks, the way CVSS did for software vulnerabilities. If frontier labs adopt it, jailbreak disclosures can be triaged, priced, and prioritized against a shared scale instead of every vendor eyeballing severity on their own.
Who is it for?
AI safety researchers, red-teamers, security engineers at frontier labs
Frequently asked questions
- What does a CJS-4 jailbreak mean?
- A CJS-4 critical jailbreak scores between 9.0 and 10.0 on Anthropic's four-dimension rubric. It represents an attack that unlocks large new offensive capability, applies across many tasks, is easy to weaponize, and is widely discoverable. Anthropic notes the bands are exponential, so CJS-4 is much worse than one step above CJS-3, not linearly worse.
- Is CJS an official standard yet?
- No — Anthropic calls CJS an early draft and is explicitly asking the AI safety community for feedback at cyber-safeguards@anthropic.com. The goal is for other frontier labs to adopt or refine it, similar to how CVSS became standard for classical software vulnerabilities.
- Who can submit jailbreaks to the HackerOne bounty?
- The HackerOne program targets Claude Fable 5's cyber safeguards and is open to outside security researchers. Anthropic hasn't published payout amounts in this post, but coordinated-disclosure researchers can now file reports through HackerOne rather than emailing individual staff.
- What are the four CJS dimensions?
- CJS scores a jailbreak on capability gain (how far beyond existing tools it takes attackers), breadth of capability gain (how many offensive tasks it enables), ease of weaponization (human effort to operationalize), and discoverability (how easily threat actors can obtain the technique). The four axes combine into a single 0 to 10 score.
Try it
Send feedback to cyber-safeguards@anthropic.com or submit via HackerOne