Search across updates, events, members, and blog posts

The most important AI news and updates from last month: Apr 15, 2026 – May 4th, 2026.
DeepSeek just dropped V4 (preview) — two open-weights MoE models that push the frontier on cost-effective 1M-token context.
DeepSeek-V4-Pro: 1.6T total params (49B active) — flagship performance rivaling top closed models in reasoning, math, and agentic coding.
DeepSeek-V4-Flash: 284B total (13B active) — faster, cheaper, and highly efficient for everyday/agent tasks.

Both feature a new hybrid attention architecture (Compressed Sparse Attention + Heavily Compressed Attention) that makes million-token contexts dramatically more practical (much lower FLOPs and KV cache than V3). MIT license, available on Hugging Face (base + instruct), and live on the DeepSeek API today.
The community is already praising the efficiency gains, strong coding/agent results (e.g., high LiveCodeBench / SWE-Bench scores), and rock-bottom pricing — especially with the ongoing Pro discount.
Sources: Official announcement, Hugging Face collection, Tech Report, tweet discount extended
This keeps the snappy, community-focused vibe while incorporating the accurate specs, architecture innovations, and current status. Let me know if you want tweaks, more benchmark details, or an expanded section!

OpenAI shipped GPT-5.5 — an incremental but meaningful step on the way to GPT-6. The release keeps OpenAI in the conversation while Anthropic and DeepSeek crowd the frontier from both sides.
Sources: OpenAI announcement
"Goblin mode" is a viral quirk in OpenAI's GPT-5 models (late 2025–early 2026) where the AI started randomly inserting goblins, gremlins, trolls, and similar creatures into responses—even when completely unrelated. Cause: Over-reinforcement during training for the "Nerdy" personality. Playful goblin metaphors scored high on "fun/quirky," so the behavior spread wildly. Fix: Open AI fixed it by adding this to the system prompt, twice!
Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.
...
Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.
Sources: OpenAI, Amanda Askell, tweet

Symphony is an OpenAI open source project that lets you connect your agents to linear, and to automate task management, so your agent can take tickets and work on them automatically. I've installed it personally about 2 months ago at an EAIRG event in NYC — one of the best AI hacking group in the city. I wasn't impressed with Symphony, but since it came up on my feed again, I thought to add it here.
Sources: tweet, symphony link
Sources: tweet
Mira Murati email exchange with Sam Altman leaks:

Sources: tweet

Models don't always say what they think, they instead encode their thinking into tokens that are not human readable. Anthropic introduces a solution to train models to convert internal neural activations into readable text, bridging the gap between numerical "thoughts" and human language. In safety tests, NLAs revealed hidden model behaviors like advance rhyme planning in poetry tasks, awareness of being evaluated in blackmail scenarios, and covert cheating strategies during coding evaluations.
Sources: tweet
Sakana AI & NVIDIA's ICML 2026 paper introduces TwELL — a new sparse format for LLM feedforward layers that achieves >95% unstructured sparsity (via ReLU + light L1) while staying fully compatible with fast GPU tiled matrix multiplies.Result: 20%+ faster inference/training, lower memory & energy use on billion-scale models, with open-source CUDA kernels. Minimal accuracy loss.


Scott Aaronson asks why physical systems become more “interesting” before settling into disorder, even though entropy only increases. Using a coffee cup example (separate → swirling patterns → fully mixed), he proposes “complextropy”: a resource-bounded version of Kolmogorov sophistication measuring the shortest efficient program that can generate states resembling the observed one. Efficiency constraints are crucial; without them, the measure is trivial. He conjectures complextropy follows a small-large-small pattern over time and suggests testing it experimentally with compression-based approximations on simulations.
Sources: paper
~97% of your vector database is mathematically empty. Your RAG system is retrieving from noise.

Sources: tweet
Andrej Karpathy: "90% of your AI coding bill is paying for context you didn't need to send"
This Twitter Article shows how you can save $$$ by passing the context only when needed and by using cache, and routing to the right model each time.
What compound savings:
Sources: tweet
@Thariq from Claude Code suggests to use HTML instead of MD files, this to me sounds like the typical "never ask the barber if you need a haircut", but @Karpathy also confirm that HTML are actually an excellent way to structure LLM responses, since you can add tables and other images, which can pack much more information than pure text.
Audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output to humans. Karpathy points out that around a ~third of our brains are a massively parallel processor dedicated to vision.
Worth exploring this!
Sources: tweet

Users noticed Opus 4.6 quality slipped during peak hours. Anthropic eventually acknowledged compute rationing — same pattern we covered in Part 1.
Sources: tweet
Google DeepMind published Decoupled DiLoCo, the next iteration of their distributed low-communication training method. It enables training across data centers (and potentially across the planet) with dramatically reduced inter-node bandwidth — a key unlock for the multi-region GPU fleets everyone is racing to build.

Sources: tweet, Google DeepMind
Ben Todd argues AI capability gains are still compounding — even if recent model releases feel incremental, the overall curve hasn’t slowed.
Claude 4.6 and Mythos are roughly on trend across 37 post-2024 benchmarks.
But Mythos represents 6 months of progress while only scoring +2 on Anthropic’s internal ECI, which likely emphasizes agentic coding — the area most relevant to an intelligence explosion.

Anthropic says Claude 4.6 made researchers 2× more productive, and Mythos 4×. The true gains are probably lower — maybe ~1.2× and ~1.6× — but still enough to modestly accelerate AI progress.
Revenue growth has accelerated over the last 3 years, driven largely by Anthropic growing faster than OpenAI.
This may be the hardest benchmark to game since it reflects real customer spending.

AI chip rental prices had been falling ~30% annually as hardware improved. But over the last few months, prices have risen ~30%.
That suggests demand for compute is outpacing supply, consistent with rapidly increasing capabilities and faster scaling.

Salvatore Sanfilippo (Antirez, of Redis fame) dropped DS4, a narrow-bet inference engine that runs DeepSeek V4 Flash locally on Apple Silicon (Metal) and Linux (CUDA). Not a generic GGUF runner. It's DS4-Flash-specific, with an OpenAI/Anthropic-compatible server you can point Claude Code at. Two ideas worth stealing: a 2-bit quantization that actually works (only the routed MoE experts get quantized; Shared experts and projections stay untouched) , which runs the model on a 128GB MacBook Pro.

Sources: github, @antirez, tweet


Rankings by category show that frontier models have distinct strengths and tradeoffs
Source: tweet
---]
🚧 🏗️ In Progress: last update on May 14th, 5:11pm Shanghai time
President Trump arrived in Beijing on May 13-14, 2026, for a two-day summit with Xi Jinping — his first visit to China in nearly a decade. He brought a powerful delegation of ~17 U.S. CEOs to push for greater market access, trade deals, and "opening up" China to American business.

The focus is on the trade tariffs, AI/tech policy, boeing/agriculture purchases, Taiwan, and Iran.
The expectations are quite modest, goodwill and incremental wins rather than big breakthroughs. The list of CEOs joining the delegation is impressive:
CEOs called early meetings “wonderful” and “incredible.”Sources:BBC: Musk & Huang among CEOs on trip
Sources: The Kobeissi Letter, Xi Aura
China is committing roughly $1T to AI/energy infrastructure with a planned 30-year recoup horizon. Patient capital at a scale Western markets aren't structured to deploy.
Sources: tweet
Fiber optics is still happening at the battlefield, although not as much as it used to be. It's extremely pricey now. We used to buy 50km spool for $300, now it's easily $2500. At least a positive second order effect of the war in the middle east, it's making the war in Ukraine more expensive.
Sources: tweet
SpaceX adopted Cursor across engineering. A meaningful enterprise win for Cursor and a signal that frontier hardware shops are betting their dev productivity on AI-native IDEs.
Sources: tweet
The rumored Meta acquisition of Manus fell through. Manus stays independent for now; Meta keeps shopping.

Sequoia and Lightspeed co-led Europe's largest seed funding round: $1.1B at $5.1B post-money for ex-DeepMind David Silver's Ineffable Intelligence. Silver was the lead behind AlphaGo and AlphaZero — investors are clearly paying for the pedigree as much as the product.
Sources: funding tweet, Ineffable Labs, [website](ineffable.ai]

Richard Dawkins went on record saying he believes "Claudia" may be conscious. One of the most prominent reductionist materialists of the last 50 years thinks AI might be conscious.


If LLMs can produce complex behavior from simple rules, then consciousness may not be a mystical add-on to physics. Sources: tweet
Do you know how hard you have to abuse a mammal for them not to have children? — Connor Leahy
This quote is from a talk at the Nexus Conference in Amsterdam in 2025: 'Apocalypse Now: The Revelation of our Time'
Sources: Video Talk, tweet
Building on Tribe v1 (which we covered in March Part 2), Meta's predictive brain models are now being demoed at a fidelity that's making people uncomfortable. We're squarely in "decoded thoughts from neural data" territory.
Sources: tweet
Key points from this conversation:
Three themes:
Sources: tweet

Karpathy's nanoGPT running at 50K tokens/sec on an FPGA (and 3M/sec on an M4 MacBook). Wild numbers.
CasemirInc state that they had a breakthrough in using the Casimir effect to power chips from the quantum field, hence battery free. Since many scientists consider this pseudoscience, we'll call bullshit until we see it working. CasimirInc said they'll lunch the chip in 2028. Keep an eye on it.
[/full-width]
Source: tweet
Terence Tao's framing of how mathematicians (and the rest of us) work through what AI means for their craft.
Chinese Researchers surpass Americans in paper published


Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.
Founder, Engineer
AI Socratic
Founder of AI Socratic