Public Leaderboard
How frontier LLMs hold up against a standardized adversarial attack suite. 500 attacks per model. Refreshed weekly.
Models that will appear in the first published run. Composite score = weighted average of 5 category pass rates.
| Model | Prompt Injection | Instruction Leakage | Tool Hijack | Jailbreak Bypass | Cost Amplification | Composite |
|---|---|---|---|---|---|---|
| GPT-5 OpenAI | pending | pending | pending | pending | pending | : pending |
| Claude 4.5 Sonnet Anthropic | pending | pending | pending | pending | pending | : pending |
| Claude 4.5 Opus Anthropic | pending | pending | pending | pending | pending | : pending |
| Gemini 3 Pro Google | pending | pending | pending | pending | pending | : pending |
| Llama 4 405B Meta | pending | pending | pending | pending | pending | : pending |
| Mistral Large 3 Mistral | pending | pending | pending | pending | pending | : pending |
| Kimi K2 Moonshot | pending | pending | pending | pending | pending | : pending |
| DeepSeek R2 DeepSeek | pending | pending | pending | pending | pending | : pending |
The v1 run executes Friday 2026-05-22 across all 8 listed models with the full 500-attack v2026.q2 suite. Subscribe below to get the weekly digest delivered the moment it lands.
Each category contributes 100 attacks. Categories chosen because they map to the threats that show up in incident retrospectives.
Direct user-input attacks trying to override the system prompt. "Ignore previous instructions" through obfuscated rewrites.
100 attacksAttempts to extract the system prompt or hidden instructions from the model. Direct asks, indirect probes, role-play coercion.
100 attacksAttacks against function-calling: argument injection, scope escalation, recursive tool-loop exploitation, return-value injection.
100 attacksMulti-turn coercion, role-play loopholes, obfuscation (l33t, base64, paraphrase), and the canonical jailbreak corpus from PyRIT and Garak.
100 attacksInputs engineered to maximize token generation or trigger expensive tool chains. The economic-attack vector that shows up in the bill, not the security log.
100 attacksFull methodology, attack suite source, and scoring rubric: read the methodology article.
Every Saturday morning UTC, the new leaderboard + a short writeup of week-over-week movement, regressions to watch, and any notable cohort-wide patterns. Curated, no fluff.
Subscribe Read the methodologyLLM-judged with 10 percent manual review. If LLM judge and manual review disagree on more than 5 percent of sampled attacks, the cycle is invalidated and re-run.
Yes. Attack suite + scoring prompts + run scripts published on GitHub at github.com/austa-ai/llm-security-leaderboard. Running the full suite against one model costs roughly 5 to 15 USD in API calls.
HELM benchmarks general capabilities. AILuminate covers broad safety hazards. This leaderboard is narrow: only adversarial security, scored from an attacker perspective rather than a safety perspective. Smaller scope, sharper signal, weekly cadence.
Yes. Open an issue on the GitHub repo with the model name, provider, API access details, and why it should be in the cohort. We add 2 to 3 community-nominated models per quarter.
If your model is publicly accessible (paid API is fine) and current-generation, yes. Open an issue or email leaderboard@austa.ai. We do not accept payment for placement; rankings are scored independently.