Skip to main content
AI Socratic
April 2026
Models

Anthropic Releases Opus 4.7

It's a decent improvement over Opus 4.6, but it's not a step function better. What you need to know about Opus 4.7:

  • Takes instructions literally
  • Better vision means improved computer use and producing slides and other visual artifacts
  • Optimized for large-scale real-world analysis
  • Better at using file system-based memory
  • Costs 2x the tokens + uses 25% more tokens than Opus 4.6

Sources: tweet, AI Arena

image.png

Federico UlfoFederico Ulfo
Models

Anthropic Mythos: Coding, Reasoning & Zero-Day Cybersecurity Capabilities

glasswing.png

We briefly mentioned the new Anthropic model leak in the previous blog post, we now have more information about it:

  • Software engineering and coding — It acts like a senior-level engineer, spotting subtle bugs, self-correcting, and achieving high scores on benchmarks (e.g., ~93.9% on SWE-bench Verified vs. 80.8% for Opus 4.6).
  • Complex reasoning — Big jumps on math (e.g., much higher on USAMO 2026), science, and knowledge work.
  • Cybersecurity — This is the headline feature. It autonomously discovers and exploits zero-day vulnerabilities at a scale and speed that far exceeds previous models and even most expert humans.

Sources: Project Glasswing, tweet, tweet, tweet

Federico UlfoFederico Ulfo
Models

OpenAI Releases ChatGPT Image 2

Image 2 is really really good, I've asked to update the header image with this prompt:

make this image in studio ghibli and with more green and plants

img.png

It's incredibly good at combining multiple subjects together while keeping it coherent and with a good image quality too AI combo

Gpt-image-2 is able to create an images of a code that generates an SVG pelican ... image.png

... and it almost passes the pelican test image.png

Sources: tweettweet, text-to-image arena bench, text-to-image arena bench 2

Federico UlfoFederico Ulfo
Models

Google Releases Gemma 4 Open Models

gemma 4 Google DeepMind launched Gemma 4, a new family of open models under Apache 2.0. The small variants (26B MoE and 31B) outperform models over 10x their size on reasoning and agentic benchmarks while being optimized for on-device and local use.

  • Built-in function calling
  • Up to 256K context on the bigger models
  • Sizes range from phone/Raspberry Pi (E2B/E4B) to workstation (31B dense + 26B MoE with only ~4B active params for efficiency).

Sources: gemma 4

Federico UlfoFederico Ulfo
Models

Xiaomi Releases MiMo-V2.5

mimo v2.5 MiMo-V2-Pro (1T+ total / 42B active) and open-weights MiMo-V2-Flash (309B total / 15B active). Optimized for long-horizon agent workflows with up to 1M context on Pro. Approaches Opus 4.6 level.

  • Pro handles autonomously 1,000+ tool calls
  • Flash delivers strong open-source coding performance (73.4% SWE-Bench Verified)
  • Hybrid attention + Multi-Token Prediction for efficient long-context reasoning and fast generation

Sources: Mimo 2.5

Federico UlfoFederico Ulfo

Search

Search across events, members, and blog posts