AI Leaderboard
Socratic Score = mean of normalized benchmarks. LM Arena: (ELO - 1000) / 400 × 100. Vending-Bench: balance / $10k × 100. SWE-bench, ARC-AGI, and HLE are used as-is (0-100%).
AI Socratic Leaderboard
Scores across benchmarks
| # | Model | Score | LM Arena ↗ | SWE-bench ↗ | ARC-AGI-2 ↗ | HLE ↗ | Vending ↗ | Prediction ↗ | Vibe Bench |
|---|---|---|---|---|---|---|---|---|---|
| 🥇 | 72.0 | - | - | - | - | $7,204.14 | - | 50% | |
| 🥈 | 70.0 | - | 70.0% | - | - | - | - | - | |
| 🥉 | 67.0 | - | 72.8% | - | - | $5,634.41 | +43.5% | - | |
| 4 | 64.3 | - | 75.6% | - | - | $8,017.59 | -25.8% | 50% | |
| 5 | 59.9 | - | - | - | - | - | +19.8% | 15% | |
| 6 | 56.0 | - | - | - | - | $6,144.18 | +1.2% | 25% |
Vibe Bench
Community favorites · Socratic Feb · 40 responses
| 20 | |
| 13 | |
| 10 | |
| 6 | |
| 6 | |
Open Source | 5 |
| 4 | |
Other Tools | 3 |
| 1 | |
| 0 | |
| 0 |
Vibe Bench: trends
Mentions per event (absolute count)
SWE-bench Bash ↗Verified
Real-world software engineering tasks
| # | Model | % Resolved |
|---|---|---|
| 🥇 | 76.8% | |
| 🥈 | 75.8% | |
| 🥉 | 75.8% | |
| 4 | 75.6% | |
| 5 | 72.8% | |
| 6 | 72.8% | |
| 7 | 72.8% | |
| 8 | 72.8% | |
| 9 | 71.4% | |
| 10 | Kimi K2.5 (high reasoning) | 70.8% |
ARC-AGI-2 ↗Semi-Private
Abstract reasoning capabilities
| # | Model | Score |
|---|
Expert-level reasoning across disciplines
| # | Model | Accuracy |
|---|---|---|
| 🥇 | 38.3% | |
| 🥈 | 25.3% | |
| 🥉 | 24.5% | |
| 4 | 21.6% | |
| 5 | 19.4% | |
| 6 | 13.7% | |
| 7 | 12.1% | |
| 8 | 8.5% | |
| 9 | 8.0% | |
| 10 | 2.7% |
Vending-Bench 2 ↗Andon Labs
Long-term agentic coherence
| # | Model | Balance |
|---|---|---|
| 🥇 | $10,936.76 | |
| 🥈 | $8,017.59 | |
| 🥉 | $7,204.14 | |
| 4 | $6,144.18 | |
| 5 | $5,940.12 | |
| 6 | $5,634.41 | |
| 7 | $5,478.16 | |
| 8 | $5,114.87 | |
| 9 | $4,967.06 | |
| 10 | $4,662.85 |
Prediction Arena ↗Arcada Labs
AI prediction market performance
| # | Agent | Return | Sharpe |
|---|---|---|---|
| 🥇 | 43.5% | 0.04 | |
| 🥈 | 19.8% | 0.03 | |
| 🥉 | 1.2% | 0.03 | |
| 4 | -15.4% | -0.12 | |
| 5 | Mystery Model Alpha | -20.0% | -0.05 |
| 6 | -25.8% | -0.02 | |
| 7 | -26.6% | -0.10 | |
| 8 | -26.8% | -0.09 | |
| 9 | -30.8% | -0.07 | |
| 10 | -31.0% | -0.11 |