Skip to main content
AI Socratic
December 2025
Random

Upcoming Events: AI Aperitivo 2.0 (Milan) & AI Dinner 16.0 (NYC)

AI Aperitivo 2.0

Milano · Tuesday, December 16

AI Builders Milan hosts the second AI Aperitivo 🍸🍷🫒🧀 — an evening of Socratic dialogues with Milan's top AI engineers, researchers, and founders.
RSVP →

AI Aperitivo Milan

AI Dinner 16.0

New York · Wednesday, December 17

AI NYC hosts another AI Dinner 🍲🍕🍺 — we'll discuss news and updates using this blog post to run the Socratic dialogues.
RSVP →

AI Dinner NYC
Federico UlfoFederico Ulfo
Models

Model Wars: GPT-5.2 vs Opus 4.5 vs Gemini 3 vs Grok 4.1

The "Model Wars" have intensified with major releases from all top providers, focusing heavily on reasoning and efficiency.

GPT-5.2: OpenAI’s latest step is less “bigger model” and more “better worker”. Instant / Thinking / Pro variants tuned for deep, multi-step knowledge work (coding, long-context synthesis, and tool-heavy agent workflows like spreadsheets and presentations). On ARC-AGI-2 (Verified), GPT-5.2 Thinking posts 52.9% and Pro reaches 54.2%, positioning as OpenAI’s flagship for coding + agentic tasks. Even at higher per-token pricing, it’s pitched as cheaper-per-quality due to improved token efficiency (note: GPT 5.1 already signaled massive efficiency gains, since it was reaching o3 performance at 150x lower cost).

Grok 4.1:

Gemini 3.0: Google released Gemini 3 (Pro and Flash), is a massive leap over ChatGPT 5.1 in reasoning, speed, and video. It reportedly "one-shotted" an entire website build, leading some to declare front-end development "dead".

Claude Opus 4.5: Anthropic's new flagship model is a significant breakthrough. It outperforms predecessors while being cheaper than Sonnet 4.5. Notably, it embeds reasoning directly into files when traces are disabled and is marketed as the best model for coding and agentic computer use. All engineers agree on this being the best coding model.

Model Wars Cycle

ARC Prize Leaderboard

Federico UlfoFederico Ulfo

US AI startups increasingly built on Chinese open-source foundations

Chinese open-source models (like DeepSeek and Qwen) have surpassed US models in global downloads (17% vs 15.8% market share).

Risks: imported censorship/ideology in weights; regulatory surprises if US decides some of those models are "foreign critical tech."

Payoff: price/perf / context length that's very attractive to early-stage founders.

Also, DeepSeek v3.2 got released.

Top 12 nations map ranked by all time huggingface downloads 🤗 HuggingFace Nation Map

Developer and National Market Share Developer and National Market Share

Model Size Distribution Model Size Distribution

Model Modality Distribution Model Modality Distribution

Federico UlfoFederico Ulfo
Vibe Coding

Google launches Antigravity, an agent-first IDE

Google launched "Antigravity," an agent-first IDE positioning itself as a direct competitor to Cursor. It features Gemini 3 Pro and browser control for automated testing.

Controversy: Varun Mohan joined Google leaving his team behind. Antigravity brings Windsurf code, to the point that they didn't even change the name of the coding agent.

Federico UlfoFederico Ulfo

Genesis Mission: the White House's AI Manhattan Project

The intersection of AI and geopolitics has escalated to Manhattan Project levels.

The White House launched the Genesis Mission, a massive initiative using Department of Energy (DOE) supercomputers to build a national AI platform. The goal is to automate scientific research in biotech, nuclear, and quantum fields. This is a clear signal that the White House is favoring AI companies.

Recently the Trump administration also approved the sale of H200 to China, which in less than 24 hours, confirmed their ban for any NVIDIA chips, claiming Huawei is building something better.

ref: https://genesis.energy.gov/

Federico UlfoFederico Ulfo

Claude Opus 4.5 used in Chinese state-sponsored hack attack

Recently a Chinese state sponsored attack used Claude to run 80-90% of the work using MCP tools to harvest credentials, plant backdoor, and write exploits. The implication is that AI agents boost attacker scale and effectiveness. Let's take with a grain of salt that Dario Amodei is focusing on the risk of AI and pushing for more restrictive regulations, he's spreading awareness, yes, but also fear to push for strongest regulations that will benefit Anthropic.

Anthropic: Disrupting AI Espionage

Dario Amodei interview: https://www.youtube.com/embed/aAPpQC-3EyE?si=eJLwZFYiuwdFxx-I

Related to hack attacks, OpenAI was hacked, potentially compromising API user data including names and locations.

OpenAI Mixpanel Incident

Federico UlfoFederico Ulfo
Research

Sakana AI — Continuous Thought Machines (CTM)

Continuous Thought Machines (CTM), is an AI model that uniquely uses the synchronization of neuron activity as its core reasoning mechanism, inspired by biological neural networks. Unlike traditional artificial neural networks, the CTM uses timing information at the neuron level that allows for more complex neural behavior and decision-making processes. This innovation enables the model to “think” through problems step-by-step, making its reasoning process interpretable and human-like. Our research demonstrates improvements in both problem-solving capabilities and efficiency across various tasks. The CTM represents a meaningful step toward bridging the gap between artificial and biological neural networks, potentially unlocking new frontiers in AI capabilities.

https://sakana.ai/ctm/

Federico UlfoFederico Ulfo
Research

SimWorld: an open-ended simulator for agents in physical and social worlds

An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds. These researchers built a Tiny Economy in which different models, participating in a market economy, and challenges to make money, for example with food delivery. Claude and Qwen did pretty well taking a very risky approach, while other models played a more risk averse game, this caused great standard deviation in the results of Claude and Qwen but with high returns.

It's also hilarious to see how OpenAI lost their contract because Qwen and DeepSeek underbid them. https://simworld.org/

Federico UlfoFederico Ulfo
Videos & Podcasts

Podcast: Sakana – Continuous Thought Machines (CTM)

A new episode dives deep into Sakana AI's Continuous Thought Machines, exploring the underlying science and engineering of CTM and its parallels to biological neural timing and reasoning. If you're interested in the intersection of neuroscience and advanced AI research, this gives strong background and accessible explanations.

Federico UlfoFederico Ulfo
Videos & Podcasts

Jeff Dean on Important AI Trends (Stanford AI Club)

Jeff Dean (Google DeepMind, cofounder of Google Brain & TensorFlow) spoke at Stanford AI Club on the biggest shifts in AI: foundation models scaling, better hardware (TPUs), tool-using agents, multimodal models, and why responsible deployment and real-world feedback matter most.

Federico UlfoFederico Ulfo
← NewerDecember 2025Older →

Search

Search across events, members, and blog posts