Dashboards

Socratic Score = mean of normalized benchmarks. LM Arena: (ELO - 1000) / 400 × 100. Vending-Bench: balance / $10k × 100. SWE-bench, ARC-AGI, and HLE are used as-is (0-100%).

AI Socratic Leaderboard

Scores across benchmarks

#	Model	Score	LM Arena ↗	SWE-bench ↗	ARC-AGI-2 ↗	HLE ↗	Vending ↗	Prediction ↗	Vibe Bench
🥇	Claude Opus 4.7	100.0	-	-	-	-	$10,936.76	-	50%
🥈	Gemini 3.1 Pro	87.1	-	-	-	-	-	+74.3%	15%
🥉	Gemini 3 Flash	75.8	-	75.8%	-	-	-	-	15%
4	MiniMax M2.5	75.8	-	75.8%	-	-	-	-	-
5	GPT 5.5	75.2	-	-	-	-	$7,523.84	-	25%
6	Claude Sonnet 4.6	72.0	-	-	-	-	$7,204.14	-	50%

Vibe Bench

Community favorites · Socratic Feb · 40 responses

Claude	20
Claude Code	13
ChatGPT	10
Gemini	6
Codex	6
Open Source	5
Cursor	4
Other Tools	3
Grok	1
Perplexity	0
Windsurf	0

Vibe Bench: trends

Mentions per event (absolute count)

SWE-bench Bash ↗Verified

Real-world software engineering tasks

#	Model	% Resolved
🥇	Claude 4.5 Opus (high reasoning)	76.8%
🥈	Gemini 3 Flash (high reasoning)	75.8%
🥉	MiniMax M2.5 (high reasoning)	75.8%
4	Claude Opus 4.6	75.6%
5	GPT-5-2 Codex	72.8%
6	GLM-5 (high reasoning)	72.8%
7	GPT-5-2 (high reasoning)	72.8%
8	GPT 5.2 Codex	72.8%
9	Claude 4.5 Sonnet (high reasoning)	71.4%
10	Kimi K2.5 (high reasoning)	70.8%

ARC-AGI-2 ↗Semi-Private

Abstract reasoning capabilities

#	Model	Score

Humanity's Last Exam ↗HLE

Expert-level reasoning across disciplines

#	Model	Accuracy
🥇	Gemini 3 Pro	38.3%
🥈	GPT-5	25.3%
🥉	Grok 4	24.5%
4	Gemini 2.5 Pro	21.6%
5	GPT-5-mini	19.4%
6	Claude 4.5 Sonnet	13.7%
7	Gemini 2.5 Flash	12.1%
8	DeepSeek-R1*	8.5%
9	o1	8.0%
10	GPT-4o	2.7%

Vending-Bench 2 ↗Andon Labs

Long-term agentic coherence

#	Model	Balance
🥇	![Claude Opus 4.7](https://andonlabs.com/images/logos/anthropic.png) Claude Opus 4.7	$10,936.76
🥈	![Claude Opus 4.6](https://andonlabs.com/images/logos/anthropic.png) Claude Opus 4.6	$8,017.59
🥉	![GPT-5.5](https://andonlabs.com/images/logos/openai.png) GPT-5.5 New	$7,523.84
4	![Claude Sonnet 4.6](https://andonlabs.com/images/logos/anthropic.png) Claude Sonnet 4.6	$7,204.14
5	![Kimi K2.6](https://andonlabs.com/images/logos/kimi.avif) Kimi K2.6 New	$6,204.57
6	![GPT-5.4](https://andonlabs.com/images/logos/openai.png) GPT-5.4	$6,144.18
7	![GPT-5.3-Codex](https://andonlabs.com/images/logos/openai.png) GPT-5.3-Codex	$5,940.12
8	![Claude Opus 4.8 - High](https://andonlabs.com/images/logos/anthropic.png) Claude Opus 4.8 - High New	$5,787.43
9	![GLM-5.1](https://andonlabs.com/images/logos/glm.png) GLM-5.1	$5,634.41
10	![Gemini 3 Pro](https://andonlabs.com/images/logos/gemini.png) Gemini 3 Pro	$5,478.16

Prediction Arena ↗Arcada Labs

AI prediction market performance

#	Agent	Return	Sharpe
🥇	Gemini 3.1 Pro	74.3%	0.03
🥈	GLM 5	37.4%	0.03
🥉	GPT 5.4	1.2%	0.02
4	GLM 4.7	-15.4%	-0.12
5	Mystery Model Alpha	-20.0%	-0.05
6	Claude Opus 4.6	-26.5%	-0.01
7	Claude Opus 4.5	-26.6%	-0.10
8	GPT 5.2	-26.8%	-0.09
9	Grok 4.1	-30.8%	-0.07
10	Gemini 3 Pro	-31.0%	-0.11

Dashboards

AI Socratic Leaderboard

Vibe Bench

Vibe Bench: trends

Search