Skip to main content
AI Socratic

xAI just launched Grok 4. The xAI benchmark showed it as a new SOTA model, but twitter accounts showed a different story. Some of the highlights include:

  • 100× more training than Grok 2 and 10× more RL compute than any other model (img1)
  • Grok 4 is single-agent, Grok 4 Heavy is multi-agent with higher performance (img2)
  • It achieves state-of-the-art on most public benchmarks: HLE, AIME25, Vending machine, ARC and ARC2 (img 3)
  • Local benchmark and empirical testing show a different story (img 4,5,6)

Grok 4 has is Ghibli moment with the sex companions and the unhinged one:

React:

Comments

Sign in as a member to join the conversation.

Loading comments…

Stay Updated

Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.

Search

Search across events, members, and blog posts