Skip to main content
AI Socratic

Dashboards

Socratic Score = mean of normalized benchmarks. LM Arena: (ELO - 1000) / 400 × 100. Vending-Bench: balance / $10k × 100. SWE-bench, ARC-AGI, and HLE are used as-is (0-100%).

AI Socratic Leaderboard

Scores across benchmarks

#ModelScoreLM Arena ↗SWE-bench ↗ARC-AGI-2 ↗HLE ↗Vending ↗Prediction ↗Vibe Bench
🥇
AnthropicClaude Opus 4.7
100.0----$10,936.76-50%
🥈
GoogleGemini 3.1 Pro
87.1-----+74.3%15%
🥉
GoogleGemini 3 Flash
75.8-75.8%----15%
4
MiniMaxMiniMax M2.5
75.8-75.8%-----
5
OpenAIGPT 5.5
75.2----$7,523.84-25%
6
AnthropicClaude Sonnet 4.6
72.0----$7,204.14-50%

Vibe Bench

Community favorites · Socratic Feb · 40 responses

AnthropicClaude
20
AnthropicClaude Code
13
OpenAIChatGPT
10
GoogleGemini
6
OpenAICodex
6
Open Source
5
CursorCursor
4
Other Tools
3
xAIGrok
1
PerplexityPerplexity
0
WindsurfWindsurf
0

Vibe Bench: trends

Mentions per event (absolute count)

Real-world software engineering tasks

#Model% Resolved
🥇
AnthropicClaude 4.5 Opus (high reasoning)
76.8%
🥈
GoogleGemini 3 Flash (high reasoning)
75.8%
🥉
MiniMaxMiniMax M2.5 (high reasoning)
75.8%
4
AnthropicClaude Opus 4.6
75.6%
5
OpenAIGPT-5-2 Codex
72.8%
6
Zhipu AIGLM-5 (high reasoning)
72.8%
7
OpenAIGPT-5-2 (high reasoning)
72.8%
8
OpenAIGPT 5.2 Codex
72.8%
9
AnthropicClaude 4.5 Sonnet (high reasoning)
71.4%
10
Kimi K2.5 (high reasoning)
70.8%
ARC-AGI-2 Semi-Private

Abstract reasoning capabilities

#ModelScore

Expert-level reasoning across disciplines

#ModelAccuracy
🥇
GoogleGemini 3 Pro
38.3%
🥈
OpenAIGPT-5
25.3%
🥉
xAIGrok 4
24.5%
4
GoogleGemini 2.5 Pro
21.6%
5
OpenAIGPT-5-mini
19.4%
6
AnthropicClaude 4.5 Sonnet
13.7%
7
GoogleGemini 2.5 Flash
12.1%
8
DeepSeekDeepSeek-R1*
8.5%
9
OpenAIo1
8.0%
10
OpenAIGPT-4o
2.7%

Long-term agentic coherence

#ModelBalance
🥇
Anthropic![Claude Opus 4.7](https://andonlabs.com/images/logos/anthropic.png) Claude Opus 4.7
$10,936.76
🥈
Anthropic![Claude Opus 4.6](https://andonlabs.com/images/logos/anthropic.png) Claude Opus 4.6
$8,017.59
🥉
OpenAI![GPT-5.5](https://andonlabs.com/images/logos/openai.png) GPT-5.5 New
$7,523.84
4
Anthropic![Claude Sonnet 4.6](https://andonlabs.com/images/logos/anthropic.png) Claude Sonnet 4.6
$7,204.14
5
![Kimi K2.6](https://andonlabs.com/images/logos/kimi.avif) Kimi K2.6 New
$6,204.57
6
OpenAI![GPT-5.4](https://andonlabs.com/images/logos/openai.png) GPT-5.4
$6,144.18
7
OpenAI![GPT-5.3-Codex](https://andonlabs.com/images/logos/openai.png) GPT-5.3-Codex
$5,940.12
8
Anthropic![Claude Opus 4.8 - High](https://andonlabs.com/images/logos/anthropic.png) Claude Opus 4.8 - High New
$5,787.43
9
Zhipu AI![GLM-5.1](https://andonlabs.com/images/logos/glm.png) GLM-5.1
$5,634.41
10
Google![Gemini 3 Pro](https://andonlabs.com/images/logos/gemini.png) Gemini 3 Pro
$5,478.16

AI prediction market performance

#AgentReturnSharpe
🥇
GoogleGemini 3.1 Pro
74.3%0.03
🥈
Zhipu AIGLM 5
37.4%0.03
🥉
OpenAIGPT 5.4
1.2%0.02
4
Zhipu AIGLM 4.7
-15.4%-0.12
5
Mystery Model Alpha
-20.0%-0.05
6
AnthropicClaude Opus 4.6
-26.5%-0.01
7
AnthropicClaude Opus 4.5
-26.6%-0.10
8
OpenAIGPT 5.2
-26.8%-0.09
9
xAIGrok 4.1
-30.8%-0.07
10
GoogleGemini 3 Pro
-31.0%-0.11

Search

Search across events, members, and blog posts