Skip to main content
AI Socratic
July 2025

Grok 4 gets an AI companion

xAI just launched Grok 4. The xAI benchmark showed it as a new SOTA model, but twitter accounts showed a different story. Some of the highlights include: 100× more training than Grok 2 and 10× more RL

Federico Ulfo

Kimi 2 👑 — New open source 1T LLMs

Kimi 2 is a new open source model from Moonshot, that uses a similar architecture of DeepSeek V3, with fewer heads, and more experts. It's really cheap and fast, taking SOTA position on several benchm

Federico Ulfo

Let's enter the AI Code CLI war

First ever AI Code CLI battle royal: claude-code, anon-kode, codex, opencode, ampcode, gemini. https://x.com/SIGKITTEN/status/1937950811910234377https://x.com/SIGKITTEN/status/1937950811910234377 Goog

Federico Ulfo

U-Net, a new recursive tokenizer

Avoids using predefined vocabs and memory-heavy embedding tables. Instead, it uses Autoregressive U-Nets to embed information directly from raw bytes. This enables infinite vocab size and more. https:

Federico Ulfo

A comment on "The Illusion of Thinking"

Pfizer researchers argue that what looks like a collapse in AI reasoning may actually be an Agentic gap — models failing not in thought, but in action. When given tools, the same models crushed tasks

Federico Ulfo

Potemkins Understanding in LLM

The paper documents a pattern they called Potemkins, a kind of reasoning inconsistency see figure below. They show that LLMs - even models like o3 — make these errors frequently. Gary Marcus: "You can

Federico Ulfo

Videos And Podcasts

https://x.com/karpathy/status/1935518272667217925 https://x.com/dwarkesh\sp/status/1938271893406310818

Federico Ulfo

Search

Search across events, members, and blog posts