Skip to main content
AI Socratic

The local-model crowd got fed too. Gemma 4: Apache 2.0, E2B to 31B, natively multimodal, up to 256K context, GGUFs on day one. It runs on 16GB machines, though r/LocalLLaMA promptly ran a face-off where Qwen3.5-9B won 5 of 8 shared benchmarks

A week later DeepMind open-sourced DiffusionGemma, a 26B MoE on the Gemma 4 backbone that ditches autoregression and denoises 256-token blocks in parallel at 1,000+ tokens/sec on a single H100. The diffusion bet is now a Google product line, not a paper.

Sources: Gemma 4, HN thread, DiffusionGemma, vLLM blog

React:

Comments

Sign in as a member to join the conversation.

Loading comments…

Stay Updated

Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.