Skip to main content
AI Socratic
May 2026
Models

DeepSeek V4

DeepSeek just dropped V4 (preview) — two open-weights MoE models that push the frontier on cost-effective 1M-token context.

DeepSeek-V4-Pro: 1.6T total params (49B active) — flagship performance rivaling top closed models in reasoning, math, and agentic coding. DeepSeek-V4-Flash: 284B total (13B active) — faster, cheaper, and highly efficient for everyday/agent tasks.

image.png

Both feature a new hybrid attention architecture (Compressed Sparse Attention + Heavily Compressed Attention) that makes million-token contexts dramatically more practical (much lower FLOPs and KV cache than V3). MIT license, available on Hugging Face (base + instruct), and live on the DeepSeek API today.

The community is already praising the efficiency gains, strong coding/agent results (e.g., high LiveCodeBench / SWE-Bench scores), and rock-bottom pricing — especially with the ongoing Pro discount.

Quick Highlights (as of early May 2026)

  • Release date: April 24, 2026 (preview)
  • Context: Native 1M tokens (with practical efficiency improvements for real agent/document workflows)
  • Reasoning modes: Non-think (fast), Think High, Think Max (deeper, higher quality on hard tasks) — all from the same weights
  • API pricing (highly competitive): Flash is extremely cheap; Pro has a big temporary discount (extended to ~May 31 in some updates) + major input cache price drop (1/10th)
  • Strengths: Coding/agentic tasks, long-context efficiency, price/performance. Text-only for now (multimodal planned later).
  • Availability: Chat at chat.deepseek.com (Expert/Instant modes), API (OpenAI/Anthropic compatible), open weights on HF/ModelScope.

Sources: Official announcement, Hugging Face collection, Tech Report, tweet discount extended

Federico UlfoFederico Ulfo
Models

OpenAI: GPT-5.5, Goblin Mode, Symphony & Realtime

GPT 5.5

image.png OpenAI shipped GPT-5.5 — an incremental but meaningful step on the way to GPT-6. The release keeps OpenAI in the conversation while Anthropic and DeepSeek crowd the frontier from both sides.

Sources: OpenAI announcement

GPT goes in Goblin Mode

"Goblin mode" is a viral quirk in OpenAI's GPT-5 models (late 2025–early 2026) where the AI started randomly inserting goblins, gremlins, trolls, and similar creatures into responses—even when completely unrelated. Cause: Over-reinforcement during training for the "Nerdy" personality. Playful goblin metaphors scored high on "fun/quirky," so the behavior spread wildly. Fix: Open AI fixed it by adding this to the system prompt, twice!

Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.
...
Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.

Screenshot.png

Sources: OpenAI, Amanda Askell, tweet

Symphony

Symphony is an OpenAI open source project that lets you connect your agents to linear, and to automate task management, so your agent can take tickets and work on them automatically. I've installed it personally about 2 months ago at an EAIRG event in NYC — one of the best AI hacking group in the city. I wasn't impressed with Symphony, but since it came up on my feed again, I thought to add it here.

Sources: tweet, symphony link

GPT-Realtime-2

  • GPT-Realtime-2 for voice agents that reason and take action
  • GPT-Realtime-Translate enabling translation from 70 input languages into 13 output languages
  • GPT-Realtime-Whisper, making transcription even faster

Sources: tweet

Mira Murati email exchange with Sam Altman leaks:

Screenshot.png

Sources: tweet

Federico UlfoFederico Ulfo
Research

Anthropic: Natural Language Autoencoders (NLAs)

image.png

Models don't always say what they think, they instead encode their thinking into tokens that are not human readable. Anthropic introduces a solution to train models to convert internal neural activations into readable text, bridging the gap between numerical "thoughts" and human language. In safety tests, NLAs revealed hidden model behaviors like advance rhyme planning in poetry tasks, awareness of being evaluated in blackmail scenarios, and covert cheating strategies during coding evaluations.

Sources: tweet

Federico UlfoFederico Ulfo
Research

SakanaAI × Nvidia: Sparser, Faster, Lighter Transformer (TwELL)

Sakana AI & NVIDIA's ICML 2026 paper introduces TwELL — a new sparse format for LLM feedforward layers that achieves >95% unstructured sparsity (via ReLU + light L1) while staying fully compatible with fast GPU tiled matrix multiplies. Result: 20%+ faster inference/training, lower memory & energy use on billion-scale models, with open-source CUDA kernels. Minimal accuracy loss.

Screenshot.png

Source: tweet, blog, paper

Federico UlfoFederico Ulfo
Research

The First Law of Complexodynamics

image.png

Scott Aaronson asks why physical systems become more “interesting” before settling into disorder, even though entropy only increases. Using a coffee cup example (separate → swirling patterns → fully mixed), he proposes “complextropy”: a resource-bounded version of Kolmogorov sophistication measuring the shortest efficient program that can generate states resembling the observed one. Efficiency constraints are crucial; without them, the measure is trivial. He conjectures complextropy follows a small-large-small pattern over time and suggests testing it experimentally with compression-based approximations on simulations.

Sources: paper

Federico UlfoFederico Ulfo
Vibe Coding

The Unreasonable Effectiveness of HTML

@Thariq from Claude Code suggests to use HTML instead of MD files, this to me sounds like the typical "never ask the barber if you need a haircut", but @Karpathy also confirm that HTML are actually an excellent way to structure LLM responses, since you can add tables and other images, which can pack much more information than pure text.

Audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output to humans. Karpathy points out that around a ~third of our brains are a massively parallel processor dedicated to vision.

Worth exploring this!

Sources: tweet

Federico UlfoFederico Ulfo
Models

Measuring What Frontier Models Know (IKP)

  • Bojie Li introduces Incompressible Knowledge Probes (IKP), 1,400 obscure factual questions across 7 tiers of difficulty, to measure factual recall in 188 models from 27 vendors including closed APIs.
  • Factual accuracy scales log-linearly with log(model parameters) on open-weight models (R²=0.917), allowing black-box size estimates: GPT-5.5 ~9T, Claude Opus 4.6 ~5T, with wide uncertainty ranges noted in follow-up.
  • Over three years, factual capacity shows no compression at fixed parameter counts, rejecting the Densing Law prediction of knowledge densification, while reasoning benchmarks saturate.

Estimated size per model:

  • GPT-5.5 ~9T
  • Claude Opus 4.7 ~4T
  • GPT-5.4 ~2.2T
  • Claude Sonnet 4.6 ~1.7T
  • Gemini 2.5 Pro ~1.2T

chart 1

Sources: tweet, paper, ikp

Federico UlfoFederico Ulfo
Models

Opus 4.6 Was Dumbed Down

Users noticed Opus 4.6 quality slipped during peak hours. Anthropic eventually acknowledged compute rationing — same pattern we covered in Part 1.

Claude 4.7

Sources: tweet

Federico UlfoFederico Ulfo
Models

Decoupled DiLoCo

Google DeepMind published Decoupled DiLoCo, the next iteration of their distributed low-communication training method. It enables training across data centers (and potentially across the planet) with dramatically reduced inter-node bandwidth — a key unlock for the multi-region GPU fleets everyone is racing to build.

diloco

Sources: tweet, Google DeepMind

Federico UlfoFederico Ulfo
Models

Is AI Accelerating?

Ben Todd argues AI capability gains are still compounding — even if recent model releases feel incremental, the overall curve hasn’t slowed.

1) Benchmarks

Claude 4.6 and Mythos are roughly on trend across 37 post-2024 benchmarks. image.png But Mythos represents 6 months of progress while only scoring +2 on Anthropic’s internal ECI, which likely emphasizes agentic coding — the area most relevant to an intelligence explosion. image.png

2) Revenue

Revenue growth has accelerated over the last 3 years, driven largely by Anthropic growing faster than OpenAI. This may be the hardest benchmark to game since it reflects real customer spending. image.png

3) Productivity uplift

Anthropic says Claude 4.6 made researchers 2× more productive, and Mythos 4×. The true gains are probably lower — maybe ~1.2× and ~1.6× — but still enough to modestly accelerate AI progress.

4) Compute demand

AI chip rental prices had been falling ~30% annually as hardware improved. But over the last few months, prices have risen ~30%. That suggests demand for compute is outpacing supply, consistent with rapidly increasing capabilities and faster scaling. image.png

Sources: blog post, tweet

Federico UlfoFederico Ulfo
Models

DS4 by Antirez

Salvatore Sanfilippo (Antirez, of Redis fame) dropped DS4, a narrow-bet inference engine that runs DeepSeek V4 Flash locally on Apple Silicon (Metal) and Linux (CUDA). Not a generic GGUF runner. It's DS4-Flash-specific, with an OpenAI/Anthropic-compatible server you can point Claude Code at. Two ideas worth stealing: a 2-bit quantization that actually works (only the routed MoE experts get quantized; shared experts and projections stay untouched), which runs the model on a 128GB MacBook Pro.

image.png

It calls tools reliably under coding agents and treating the KV cache as a first-class disk citizen, hashed by SHA1 of the rendered prefix so stateless API clients reuse cached state across sessions and restarts. Antirez also says openly that DS4 was built with strong assistance from GPT-5.5 — refreshingly honest about how high-end systems code gets written in 2026.

Sources: github, @antirez, tweet

Federico UlfoFederico Ulfo

Fiber optics cable cost 8x up

Fiber optics is still happening at the battlefield, although not as much as it used to be. It's extremely pricey now. We used to buy 50km spool for $300, now it's easily $2500. At least a positive second order effect of the war in the middle east, it's making the war in Ukraine more expensive.

Sources: tweet

Federico UlfoFederico Ulfo

SpaceX × Cursor

SpaceX adopted Cursor across engineering. A meaningful enterprise win for Cursor and a signal that frontier hardware shops are betting their dev productivity on AI-native IDEs.

Sources: tweet

Federico UlfoFederico Ulfo
Videos & Podcasts

Dwarkesh Blackboard Lectures

Dwarkesh recently started running a new blackboard lectures series with some of the top researchers and engineers in the space.. and we are all here for it 🙌

How GPT, Claude, and Gemini are actually trained and served – Reiner Pope

Reiner Pope gives a blackboard-style walkthrough of how frontier LLMs are trained and deployed, showing how much of the AI industry’s inner workings can be inferred from equations, API pricing, and first principles.

What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang

Eric Jang explains how rebuilding AlphaGo with modern AI tools reveals core principles of intelligence—search, self-play, and learning—and why its MCTS-based reinforcement learning may offer a better model for how future AIs and humans learn than today’s token-level RL in LLMs.

Chip design from the bottom up – Reiner Pope

How do chips actually work - starting with basic logic gates, and working up to why GPUs, TPUs, FPGAs, and the human brain each look the way they do. Reiner is CEO of MatX, a new chip startup, previously at Google working on software efficiency, compilers, and TPU architecture.

Federico UlfoFederico Ulfo
Videos & Podcasts

AI Ascent 2026 by Sequoia Capital

Sequoia Capital's AI Ascent 2026 convened Greg Brockman, Andrej Karpathy, Demis Hassabis, Boris Cherny, Dmitri Dolgov, and more with 150+ leading founders and researchers to discuss the present and future of AI.

Fireside Chat: Sequoia × Karpathy

  1. LLMs enable new primitives: apps fully engulfed by LLMs, "install .md, not .sh", knowledge systems over arbitrary unstructured data.
  2. LLM jaggedness: a model can refactor a 100k-line codebase and still fail basic tasks — increasingly it’s about both verifiability and economics: frontier labs heavily optimize domains with strong reward signals and large TAMs.
  3. The agent-native economy: products decomposing into sensors, actuators, and logic; systems designed to be maximally legible to LLMs; and the rise of agentic engineering as a new discipline.

Sources: Full playlist, tweet

Federico UlfoFederico Ulfo
Random

Random — quick links

  • Claude Code finds the password of a locked Bitcoin wallet: tweet
  • Casimir Effect to power a battery from the quantum field, hence battery-free. Likely bullshit, but let's see: tweet
  • Terence Tao — 5 Stages of AI Grief: tweet
  • Karpathy's nanoGPT running at 50K tokens/sec on an FPGA (and 3M/sec on an M4 MacBook): tweet
  • Animal Translatortweet
  • Cool hairtweet
  • You can't outsource understanding — Karpathy's line of the month: tweet
  • Dwarkesh hot taketweet
  • The "language tax" — non-English speakers pay more compute per token: tweet
  • How cells move — beautiful microscopy: tweet
  • Placebo sleep affects cognition: tweet
  • Mars terraformingtweet
  • Solved an Erdős problem with no advanced math knowledgetweet
  • Wayback Machinetweet
  • Nobody checks compiler codetweet
  • Top research papers of the monthtweet

GitHub Historical Analytics

Federico UlfoFederico Ulfo

Search

Search across events, members, and blog posts