Skip to main content
AI Socratic
February 2026
Models

Google Gemini 3.1

Google released Gemini 3.1, continuing its push toward deeper reasoning and longer-horizon workflows, with tons of updates:

  • 1M+ token context
  • Stronger reasoning (big jump on ARC-AGI-style benchmarks)
  • Native multimodal: text, image, audio, video, code
  • Better tool use + structured outputs
  • Deeper integration across Gemini app, Workspace, and Vertex AI

Google’s angle is less about a single flashy demo and more about distribution + infrastructure leverage:

  • tight coupling with Search and Workspace
  • TPU-first optimization
  • enterprise rollout via Vertex

image.png

Sources: Google Gemini 3.1 announcement, DeepMind model card, Vertex AI rollout, ARC AGI-2

Federico UlfoFederico Ulfo
Models

Anthropic Claude Opus 4.6 and Sonnet 4.6

Claude Opus 4.6

Opus 4.6 improvement over Opus 4.5:

  • Agent Team: it can run and coordinate sub-agents, and you can access them individually now.
  • 1M tokens context
  • Compaction + Adaptive Thinking + Effort: it's more frugal in how uses tokens
  • Max thinking: if you have cash to burn it runs 6x faster but it costs 2.5 time more. We suspect it's using Cerebras or some other solution under the hood.

Claude Sonnet 4.6

Anthropic has also released Claude Sonnet 4.6, the latest update to its mid-tier model family, which is now the default model across both free and paid tiers on claude.ai and Claude Cowork. Sonnet 4.6 delivers stronger reasoning, better coding performance, improved computer use, and long-context reasoning, building on the 4.5 lineage. Key points of Sonnet 4.6:

  • 1M token context
  • Improved coding, reasoning, agent planning, knowledge work, and design.

Sonnet 4.6 narrows the gap between mid-tier and flagship performance, bringing many higher-end capabilities to broader users while retaining a cost-effective position relative to Opus offerings.

The benchmarks are solid, but we take them with a grain of salt, since models are overfitting for them: Benchmark

Sources: Claude Opus 4.6, tweet 1, tweet 2, tweet 3

Federico UlfoFederico Ulfo
Models

Kimi K2.5 (Open Source, ~1T params, Swarms)

Kimi K2.5 is the clearest signal this month that “open source” is catching up to private models.

  • Open source
  • ~1T parameters
  • Beats Claude Opus 4.5 on multiple benchmarks (as reported by the community)
  • Uses agent swarms to cut execution time by up to ~4.5× image.png

The metatrend: “model quality” is becoming less about single-shot IQ and more about systems:

  • planning loops
  • tool use
  • memory
  • coordination
  • and reliability under long horizons

Sources: tweet 1, Hugging Face

Federico UlfoFederico Ulfo
Models

ByteDance Seedance 2.0 — The Movie Singularity Is Here

ByteDance just released Seedance 2 and is the most impressive VLM released yet. Seedance breaks the acrobatic benchmark and is able to use real actor, with some incredible expressivity. Disney sent a seize-and-desist.

Camera actions

#### Explosions and collisions #### ByteDance DGAF about copyright #### Creators content about to explode
Federico UlfoFederico Ulfo
← NewerFebruary 2026Older →

Search

Search across events, members, and blog posts