Dashboards
Socratic Score = mean of normalized benchmarks. LM Arena: (ELO - 1000) / 400 × 100. Vending-Bench: balance / $10k × 100. SWE-bench, ARC-AGI, and HLE are used as-is (0-100%).
AI Socratic Leaderboard
Scores across benchmarks
| # | Model | Score | LM Arena ↗ | SWE-bench ↗ | ARC-AGI-2 ↗ | HLE ↗ | Vending ↗ | Prediction ↗ | Vibe Bench |
|---|---|---|---|---|---|---|---|---|---|
| 🥇 | 86.3 | - | - | - | - | - | +72.7% | - | |
| 🥈 | 72.0 | - | - | - | - | $7,204.14 | - | - | |
| 🥉 | 70.0 | - | 70.0% | - | - | - | - | - | |
| 4 | 68.7 | - | 72.8% | - | - | $5,634.41 | +53.8% | - | |
| 5 | 64.2 | - | 75.6% | - | - | $8,017.59 | -26.5% | - | |
| 6 | 56.7 | - | 76.8% | - | - | - | -26.6% | - |
Vibe Bench
Community favorites · Socratic Feb · 0 responses
| 0 | |
| 0 | |
| 0 | |
| 0 | |
Open Source | 0 |
| 0 | |
| 0 | |
| 0 | |
| 0 | |
| 0 | |
Other Tools | 0 |
Vibe Bench: trends
Mentions per event (absolute count)
SWE-bench Bash ↗Verified
Real-world software engineering tasks
| # | Model | % Resolved |
|---|---|---|
| 🥇 | 76.8% | |
| 🥈 | 75.8% | |
| 🥉 | 75.8% | |
| 4 | 75.6% | |
| 5 | 72.8% | |
| 6 | 72.8% | |
| 7 | 72.8% | |
| 8 | 72.8% | |
| 9 | 71.4% | |
| 10 | Kimi K2.5 (high reasoning) | 70.8% |
ARC-AGI-2 ↗Semi-Private
Abstract reasoning capabilities
| # | Model | Score |
|---|
Expert-level reasoning across disciplines
| # | Model | Accuracy |
|---|---|---|
| 🥇 | 38.3% | |
| 🥈 | 25.3% | |
| 🥉 | 24.5% | |
| 4 | 21.6% | |
| 5 | 19.4% | |
| 6 | 13.7% | |
| 7 | 12.1% | |
| 8 | 8.5% | |
| 9 | 8.0% | |
| 10 | 2.7% |
Vending-Bench 2 ↗Andon Labs
Long-term agentic coherence
| # | Model | Balance |
|---|---|---|
| 🥇 | $10,936.76 | |
| 🥈 | $8,017.59 | |
| 🥉 | $7,523.84 | |
| 4 | $7,204.14 | |
| 5 |  Kimi K2.6 New | $6,204.57 |
| 6 | $6,144.18 | |
| 7 | $5,940.12 | |
| 8 | $5,634.41 | |
| 9 | $5,478.16 | |
| 10 | $5,396.42 |
Prediction Arena ↗Arcada Labs
AI prediction market performance
| # | Agent | Return | Sharpe |
|---|---|---|---|
| 🥇 | 72.7% | 0.03 | |
| 🥈 | 53.8% | 0.03 | |
| 🥉 | 1.2% | 0.02 | |
| 4 | -15.4% | -0.12 | |
| 5 | Mystery Model Alpha | -20.0% | -0.05 |
| 6 | -26.5% | -0.01 | |
| 7 | -26.6% | -0.10 | |
| 8 | -26.8% | -0.09 | |
| 9 | -30.8% | -0.07 | |
| 10 | -31.0% | -0.11 |