Skip to main content
AI Socratic
May 2026
Models

DeepSeek V4

DeepSeek just dropped V4 (preview) — two open-weights MoE models that push the frontier on cost-effective 1M-token context.

DeepSeek-V4-Pro: 1.6T total params (49B active) — flagship performance rivaling top closed models in reasoning, math, and agentic coding. DeepSeek-V4-Flash: 284B total (13B active) — faster, cheaper, and highly efficient for everyday/agent tasks.

image.png

Both feature a new hybrid attention architecture (Compressed Sparse Attention + Heavily Compressed Attention) that makes million-token contexts dramatically more practical (much lower FLOPs and KV cache than V3). MIT license, available on Hugging Face (base + instruct), and live on the DeepSeek API today.

The community is already praising the efficiency gains, strong coding/agent results (e.g., high LiveCodeBench / SWE-Bench scores), and rock-bottom pricing — especially with the ongoing Pro discount.

Quick Highlights (as of early May 2026)

  • Release date: April 24, 2026 (preview)
  • Context: Native 1M tokens (with practical efficiency improvements for real agent/document workflows)
  • Reasoning modes: Non-think (fast), Think High, Think Max (deeper, higher quality on hard tasks) — all from the same weights
  • API pricing (highly competitive): Flash is extremely cheap; Pro has a big temporary discount (extended to ~May 31 in some updates) + major input cache price drop (1/10th)
  • Strengths: Coding/agentic tasks, long-context efficiency, price/performance. Text-only for now (multimodal planned later).
  • Availability: Chat at chat.deepseek.com (Expert/Instant modes), API (OpenAI/Anthropic compatible), open weights on HF/ModelScope.

Sources: Official announcement, Hugging Face collection, Tech Report, tweet discount extended

Federico UlfoFederico Ulfo
Models

OpenAI: GPT-5.5, Goblin Mode, Symphony & Realtime

GPT 5.5

image.png OpenAI shipped GPT-5.5 — an incremental but meaningful step on the way to GPT-6. The release keeps OpenAI in the conversation while Anthropic and DeepSeek crowd the frontier from both sides.

Sources: OpenAI announcement

GPT goes in Goblin Mode

"Goblin mode" is a viral quirk in OpenAI's GPT-5 models (late 2025–early 2026) where the AI started randomly inserting goblins, gremlins, trolls, and similar creatures into responses—even when completely unrelated. Cause: Over-reinforcement during training for the "Nerdy" personality. Playful goblin metaphors scored high on "fun/quirky," so the behavior spread wildly. Fix: Open AI fixed it by adding this to the system prompt, twice!

Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.
...
Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.

Screenshot.png

Sources: OpenAI, Amanda Askell, tweet

Symphony

Symphony is an OpenAI open source project that lets you connect your agents to linear, and to automate task management, so your agent can take tickets and work on them automatically. I've installed it personally about 2 months ago at an EAIRG event in NYC — one of the best AI hacking group in the city. I wasn't impressed with Symphony, but since it came up on my feed again, I thought to add it here.

Sources: tweet, symphony link

GPT-Realtime-2

  • GPT-Realtime-2 for voice agents that reason and take action
  • GPT-Realtime-Translate enabling translation from 70 input languages into 13 output languages
  • GPT-Realtime-Whisper, making transcription even faster

Sources: tweet

Mira Murati email exchange with Sam Altman leaks:

Screenshot.png

Sources: tweet

Federico UlfoFederico Ulfo
Models

Measuring What Frontier Models Know (IKP)

  • Bojie Li introduces Incompressible Knowledge Probes (IKP), 1,400 obscure factual questions across 7 tiers of difficulty, to measure factual recall in 188 models from 27 vendors including closed APIs.
  • Factual accuracy scales log-linearly with log(model parameters) on open-weight models (R²=0.917), allowing black-box size estimates: GPT-5.5 ~9T, Claude Opus 4.6 ~5T, with wide uncertainty ranges noted in follow-up.
  • Over three years, factual capacity shows no compression at fixed parameter counts, rejecting the Densing Law prediction of knowledge densification, while reasoning benchmarks saturate.

Estimated size per model:

  • GPT-5.5 ~9T
  • Claude Opus 4.7 ~4T
  • GPT-5.4 ~2.2T
  • Claude Sonnet 4.6 ~1.7T
  • Gemini 2.5 Pro ~1.2T

chart 1

Sources: tweet, paper, ikp

Federico UlfoFederico Ulfo
Models

Opus 4.6 Was Dumbed Down

Users noticed Opus 4.6 quality slipped during peak hours. Anthropic eventually acknowledged compute rationing — same pattern we covered in Part 1.

Claude 4.7

Sources: tweet

Federico UlfoFederico Ulfo
Models

Decoupled DiLoCo

Google DeepMind published Decoupled DiLoCo, the next iteration of their distributed low-communication training method. It enables training across data centers (and potentially across the planet) with dramatically reduced inter-node bandwidth — a key unlock for the multi-region GPU fleets everyone is racing to build.

diloco

Sources: tweet, Google DeepMind

Federico UlfoFederico Ulfo
Models

Is AI Accelerating?

Ben Todd argues AI capability gains are still compounding — even if recent model releases feel incremental, the overall curve hasn’t slowed.

1) Benchmarks

Claude 4.6 and Mythos are roughly on trend across 37 post-2024 benchmarks. image.png But Mythos represents 6 months of progress while only scoring +2 on Anthropic’s internal ECI, which likely emphasizes agentic coding — the area most relevant to an intelligence explosion. image.png

2) Revenue

Revenue growth has accelerated over the last 3 years, driven largely by Anthropic growing faster than OpenAI. This may be the hardest benchmark to game since it reflects real customer spending. image.png

3) Productivity uplift

Anthropic says Claude 4.6 made researchers 2× more productive, and Mythos 4×. The true gains are probably lower — maybe ~1.2× and ~1.6× — but still enough to modestly accelerate AI progress.

4) Compute demand

AI chip rental prices had been falling ~30% annually as hardware improved. But over the last few months, prices have risen ~30%. That suggests demand for compute is outpacing supply, consistent with rapidly increasing capabilities and faster scaling. image.png

Sources: blog post, tweet

Federico UlfoFederico Ulfo
Models

DS4 by Antirez

Salvatore Sanfilippo (Antirez, of Redis fame) dropped DS4, a narrow-bet inference engine that runs DeepSeek V4 Flash locally on Apple Silicon (Metal) and Linux (CUDA). Not a generic GGUF runner. It's DS4-Flash-specific, with an OpenAI/Anthropic-compatible server you can point Claude Code at. Two ideas worth stealing: a 2-bit quantization that actually works (only the routed MoE experts get quantized; shared experts and projections stay untouched), which runs the model on a 128GB MacBook Pro.

image.png

It calls tools reliably under coding agents and treating the KV cache as a first-class disk citizen, hashed by SHA1 of the rendered prefix so stateless API clients reuse cached state across sessions and restarts. Antirez also says openly that DS4 was built with strong assistance from GPT-5.5 — refreshingly honest about how high-end systems code gets written in 2026.

Sources: github, @antirez, tweet

Federico UlfoFederico Ulfo

Search

Search across events, members, and blog posts