Skip to main content
AI Socratic

AI Leaderboard

Socratic Score = mean of normalized benchmarks. LM Arena: (ELO - 1000) / 400 × 100. Vending-Bench: balance / $10k × 100. SWE-bench, ARC-AGI, and HLE are used as-is (0-100%).

AI Socratic Leaderboard

Scores across benchmarks

#ModelScoreLM Arena ↗SWE-bench ↗ARC-AGI-2 ↗HLE ↗Vending ↗Prediction ↗Vibe Bench
🥇
AnthropicClaude Sonnet 4.6
72.0----$7,204.14-50%
🥈
DeepSeekDeepSeek V3
70.0-70.0%-----
🥉
ZhipuGLM 5
67.0-72.8%--$5,634.41+43.5%-
4
AnthropicClaude Opus 4.6
64.3-75.6%--$8,017.59-25.8%50%
5
GoogleGemini 3.1 Pro
59.9-----+19.8%15%
6
OpenAIGPT 5.4
56.0----$6,144.18+1.2%25%

Vibe Bench

Community favorites · Socratic Feb · 40 responses

AnthropicClaude
20
AnthropicClaude Code
13
OpenAIChatGPT
10
GoogleGemini
6
OpenAICodex
6
Open Source
5
CursorCursor
4
Other Tools
3
xAIGrok
1
PerplexityPerplexity
0
WindsurfWindsurf
0

Vibe Bench: trends

Mentions per event (absolute count)

Real-world software engineering tasks

#Model% Resolved
🥇
AnthropicClaude 4.5 Opus (high reasoning)
76.8%
🥈
GoogleGemini 3 Flash (high reasoning)
75.8%
🥉
MiniMaxMiniMax M2.5 (high reasoning)
75.8%
4
AnthropicClaude Opus 4.6
75.6%
5
OpenAIGPT-5-2 Codex
72.8%
6
Zhipu AIGLM-5 (high reasoning)
72.8%
7
OpenAIGPT-5-2 (high reasoning)
72.8%
8
OpenAIGPT 5.2 Codex
72.8%
9
AnthropicClaude 4.5 Sonnet (high reasoning)
71.4%
10
Kimi K2.5 (high reasoning)
70.8%
ARC-AGI-2 Semi-Private

Abstract reasoning capabilities

#ModelScore

Expert-level reasoning across disciplines

#ModelAccuracy
🥇
GoogleGemini 3 Pro
38.3%
🥈
OpenAIGPT-5
25.3%
🥉
xAIGrok 4
24.5%
4
GoogleGemini 2.5 Pro
21.6%
5
OpenAIGPT-5-mini
19.4%
6
AnthropicClaude 4.5 Sonnet
13.7%
7
GoogleGemini 2.5 Flash
12.1%
8
DeepSeekDeepSeek-R1*
8.5%
9
OpenAIo1
8.0%
10
OpenAIGPT-4o
2.7%

Long-term agentic coherence

#ModelBalance
🥇
AnthropicClaude Opus 4.7 New
$10,936.76
🥈
AnthropicClaude Opus 4.6
$8,017.59
🥉
AnthropicClaude Sonnet 4.6
$7,204.14
4
OpenAIGPT-5.4
$6,144.18
5
OpenAIGPT-5.3-Codex
$5,940.12
6
Zhipu AIGLM-5.1
$5,634.41
7
GoogleGemini 3 Pro
$5,478.16
8
AlibabaQwen 3.6 Plus
$5,114.87
9
AnthropicClaude Opus 4.5
$4,967.06
10
xAIGrok 4.20
$4,662.85

AI prediction market performance

#AgentReturnSharpe
🥇
Zhipu AIGLM 5
43.5%0.04
🥈
GoogleGemini 3.1 Pro
19.8%0.03
🥉
OpenAIGPT 5.4
1.2%0.03
4
Zhipu AIGLM 4.7
-15.4%-0.12
5
Mystery Model Alpha
-20.0%-0.05
6
AnthropicClaude Opus 4.6
-25.8%-0.02
7
AnthropicClaude Opus 4.5
-26.6%-0.10
8
OpenAIGPT 5.2
-26.8%-0.09
9
xAIGrok 4.1
-30.8%-0.07
10
GoogleGemini 3 Pro
-31.0%-0.11

Search

Search across updates, events, members, and blog posts