Skip to main content
AI Socratic

Dashboards

Socratic Score = mean of normalized benchmarks. LM Arena: (ELO - 1000) / 400 × 100. Vending-Bench: balance / $10k × 100. SWE-bench, ARC-AGI, and HLE are used as-is (0-100%).

AI Socratic Leaderboard

Scores across benchmarks

#ModelScoreLM Arena ↗SWE-bench ↗ARC-AGI-2 ↗HLE ↗Vending ↗Prediction ↗Vibe Bench
🥇
GoogleGemini 3.1 Pro
86.3-----+72.7%-
🥈
AnthropicClaude Sonnet 4.6
72.0----$7,204.14--
🥉
DeepSeekDeepSeek V3
70.0-70.0%-----
4
ZhipuGLM 5
68.7-72.8%--$5,634.41+53.8%-
5
AnthropicClaude Opus 4.6
64.2-75.6%--$8,017.59-26.5%-
6
AnthropicClaude Opus 4.5
56.7-76.8%----26.6%-

Vibe Bench

Community favorites · Socratic Feb · 0 responses

AnthropicClaude
0
OpenAIChatGPT
0
GoogleGemini
0
xAIGrok
0
Open Source
0
AnthropicClaude Code
0
CursorCursor
0
OpenAICodex
0
PerplexityPerplexity
0
WindsurfWindsurf
0
Other Tools
0

Vibe Bench: trends

Mentions per event (absolute count)

Real-world software engineering tasks

#Model% Resolved
🥇
AnthropicClaude 4.5 Opus (high reasoning)
76.8%
🥈
GoogleGemini 3 Flash (high reasoning)
75.8%
🥉
MiniMaxMiniMax M2.5 (high reasoning)
75.8%
4
AnthropicClaude Opus 4.6
75.6%
5
OpenAIGPT-5-2 Codex
72.8%
6
Zhipu AIGLM-5 (high reasoning)
72.8%
7
OpenAIGPT-5-2 (high reasoning)
72.8%
8
OpenAIGPT 5.2 Codex
72.8%
9
AnthropicClaude 4.5 Sonnet (high reasoning)
71.4%
10
Kimi K2.5 (high reasoning)
70.8%
ARC-AGI-2 Semi-Private

Abstract reasoning capabilities

#ModelScore

Expert-level reasoning across disciplines

#ModelAccuracy
🥇
GoogleGemini 3 Pro
38.3%
🥈
OpenAIGPT-5
25.3%
🥉
xAIGrok 4
24.5%
4
GoogleGemini 2.5 Pro
21.6%
5
OpenAIGPT-5-mini
19.4%
6
AnthropicClaude 4.5 Sonnet
13.7%
7
GoogleGemini 2.5 Flash
12.1%
8
DeepSeekDeepSeek-R1*
8.5%
9
OpenAIo1
8.0%
10
OpenAIGPT-4o
2.7%

Long-term agentic coherence

#ModelBalance
🥇
Anthropic![Claude Opus 4.7](https://andonlabs.com/images/logos/anthropic.png) Claude Opus 4.7
$10,936.76
🥈
Anthropic![Claude Opus 4.6](https://andonlabs.com/images/logos/anthropic.png) Claude Opus 4.6
$8,017.59
🥉
OpenAI![GPT-5.5](https://andonlabs.com/images/logos/openai.png) GPT-5.5 New
$7,523.84
4
Anthropic![Claude Sonnet 4.6](https://andonlabs.com/images/logos/anthropic.png) Claude Sonnet 4.6
$7,204.14
5
![Kimi K2.6](https://andonlabs.com/images/logos/kimi.avif) Kimi K2.6 New
$6,204.57
6
OpenAI![GPT-5.4](https://andonlabs.com/images/logos/openai.png) GPT-5.4
$6,144.18
7
OpenAI![GPT-5.3-Codex](https://andonlabs.com/images/logos/openai.png) GPT-5.3-Codex
$5,940.12
8
Zhipu AI![GLM-5.1](https://andonlabs.com/images/logos/glm.png) GLM-5.1
$5,634.41
9
Google![Gemini 3 Pro](https://andonlabs.com/images/logos/gemini.png) Gemini 3 Pro
$5,478.16
10
Google![Gemini 3.5 Flash](https://andonlabs.com/images/logos/gemini.png) Gemini 3.5 Flash
$5,396.42

AI prediction market performance

#AgentReturnSharpe
🥇
GoogleGemini 3.1 Pro
72.7%0.03
🥈
Zhipu AIGLM 5
53.8%0.03
🥉
OpenAIGPT 5.4
1.2%0.02
4
Zhipu AIGLM 4.7
-15.4%-0.12
5
Mystery Model Alpha
-20.0%-0.05
6
AnthropicClaude Opus 4.6
-26.5%-0.01
7
AnthropicClaude Opus 4.5
-26.6%-0.10
8
OpenAIGPT 5.2
-26.8%-0.09
9
xAIGrok 4.1
-30.8%-0.07
10
GoogleGemini 3 Pro
-31.0%-0.11

Search

Search across events, members, and blog posts