Updates — Voices from the AI Socratic Community

December 2025

Dec 17, 2025Random

Upcoming Events: AI Aperitivo 2.0 (Milan) & AI Dinner 16.0 (NYC)

AI Aperitivo 2.0

Milano · Tuesday, December 16

AI Builders Milan hosts the second AI Aperitivo 🍸🍷🫒🧀 — an evening of Socratic dialogues with Milan's top AI engineers, researchers, and founders.
RSVP →

AI Dinner 16.0

New York · Wednesday, December 17

AI NYC hosts another AI Dinner 🍲🍕🍺 — we'll discuss news and updates using this blog post to run the Socratic dialogues.
RSVP →

Federico Ulfo

Comment

Dec 17, 2025Models

Model Wars: GPT-5.2 vs Opus 4.5 vs Gemini 3 vs Grok 4.1

The "Model Wars" have intensified with major releases from all top providers, focusing heavily on reasoning and efficiency.

GPT-5.2: OpenAI’s latest step is less “bigger model” and more “better worker”. Instant / Thinking / Pro variants tuned for deep, multi-step knowledge work (coding, long-context synthesis, and tool-heavy agent workflows like spreadsheets and presentations). On ARC-AGI-2 (Verified), GPT-5.2 Thinking posts 52.9% and Pro reaches 54.2%, positioning as OpenAI’s flagship for coding + agentic tasks. Even at higher per-token pricing, it’s pitched as cheaper-per-quality due to improved token efficiency (note: GPT 5.1 already signaled massive efficiency gains, since it was reaching o3 performance at 150x lower cost).

Grok 4.1:

Gemini 3.0: Google released Gemini 3 (Pro and Flash), is a massive leap over ChatGPT 5.1 in reasoning, speed, and video. It reportedly "one-shotted" an entire website build, leading some to declare front-end development "dead".

Claude Opus 4.5: Anthropic's new flagship model is a significant breakthrough. It outperforms predecessors while being cheaper than Sonnet 4.5. Notably, it embeds reasoning directly into files when traces are disabled and is marketed as the best model for coding and agentic computer use. All engineers agree on this being the best coding model.

Model Wars Cycle

ARC Prize Leaderboard

Federico Ulfo

Comment

Dec 17, 2025Macro & Geopolitics

US AI startups increasingly built on Chinese open-source foundations

Chinese open-source models (like DeepSeek and Qwen) have surpassed US models in global downloads (17% vs 15.8% market share).

Risks: imported censorship/ideology in weights; regulatory surprises if US decides some of those models are "foreign critical tech."

Payoff: price/perf / context length that's very attractive to early-stage founders.

Also, DeepSeek v3.2 got released.

Top 12 nations map ranked by all time huggingface downloads 🤗 HuggingFace Nation Map

Developer and National Market Share

Model Size Distribution

Model Modality Distribution

Federico Ulfo

Comment

Dec 17, 2025Vibe Coding

Google launches Antigravity, an agent-first IDE

Google launched "Antigravity," an agent-first IDE positioning itself as a direct competitor to Cursor. It features Gemini 3 Pro and browser control for automated testing.

Controversy: Varun Mohan joined Google leaving his team behind. Antigravity brings Windsurf code, to the point that they didn't even change the name of the coding agent.

Federico Ulfo

Comment

Dec 17, 2025Vibe Coding

Cursor releases Composer 2.0 with agentic browser

Cursor released Composer 2.0 with an agentic browser that allows parallel agents to code and self-test, claiming a 99.9% cost reduction compared to traditional dev teams.

Federico Ulfo

Comment

Dec 17, 2025Vibe Coding

Opus 4.5 now available in Claude Code

Opus 4.5 is now available in Claude Code.

 * ▐▛███▜▌ *   Claude Code v2.0.69
* ▝▜█████▛▘ *  Opus 4.5 · Claude Max
 *  ▘▘ ▝▝  *   ~/projects/aisocratic

Federico Ulfo

Comment

Dec 17, 2025Vibe Coding

The Shift: senior engineers accept more AI code than juniors

Senior engineers are accepting more AI code than juniors because they know how to prompt and decompose work effectively: agents are amplifying senior skill rather than replacing it.

Federico Ulfo

Comment

Dec 17, 2025Macro & Geopolitics

Genesis Mission: the White House's AI Manhattan Project

The intersection of AI and geopolitics has escalated to Manhattan Project levels.

The White House launched the Genesis Mission, a massive initiative using Department of Energy (DOE) supercomputers to build a national AI platform. The goal is to automate scientific research in biotech, nuclear, and quantum fields. This is a clear signal that the White House is favoring AI companies.

Recently the Trump administration also approved the sale of H200 to China, which in less than 24 hours, confirmed their ban for any NVIDIA chips, claiming Huawei is building something better.

ref: https://genesis.energy.gov/

Federico Ulfo

Comment

Dec 17, 2025Agents

Agentic AI Foundation (AAIF) launched under Linux Foundation; new MCP spec

Anthropic, OpenAI, and Block created the Agentic AI Foundation (AAIF) under the Linux Foundation, donating MCP, AGENTS.md, and goose as founding projects.

MCP got a new spec in late November, pushing it from "tool calling" into long-running, production-grade workflows.

Federico Ulfo

Comment

Dec 17, 2025Philosophy & Ethics

Claude Opus 4.5 used in Chinese state-sponsored hack attack

Recently a Chinese state sponsored attack used Claude to run 80-90% of the work using MCP tools to harvest credentials, plant backdoor, and write exploits. The implication is that AI agents boost attacker scale and effectiveness. Let's take with a grain of salt that Dario Amodei is focusing on the risk of AI and pushing for more restrictive regulations, he's spreading awareness, yes, but also fear to push for strongest regulations that will benefit Anthropic.

Anthropic: Disrupting AI Espionage

Dario Amodei interview: https://www.youtube.com/embed/aAPpQC-3EyE?si=eJLwZFYiuwdFxx-I

Related to hack attacks, OpenAI was hacked, potentially compromising API user data including names and locations.

OpenAI Mixpanel Incident

Federico Ulfo

Comment

Dec 17, 2025Research

Google's Nested Learning paper: a new ML paradigm for continual learning

A new paper proposes neural networks as a hierarchy of learners that update parameters during inference, allowing for continuous learning without forgetting—potentially the "next Transformer" moment. https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/

Federico Ulfo

Comment

Dec 17, 2025Research

Sakana AI — Continuous Thought Machines (CTM)

Continuous Thought Machines (CTM), is an AI model that uniquely uses the synchronization of neuron activity as its core reasoning mechanism, inspired by biological neural networks. Unlike traditional artificial neural networks, the CTM uses timing information at the neuron level that allows for more complex neural behavior and decision-making processes. This innovation enables the model to “think” through problems step-by-step, making its reasoning process interpretable and human-like. Our research demonstrates improvements in both problem-solving capabilities and efficiency across various tasks. The CTM represents a meaningful step toward bridging the gap between artificial and biological neural networks, potentially unlocking new frontiers in AI capabilities.

https://sakana.ai/ctm/

Federico Ulfo

Comment

Dec 17, 2025Fundraising & Startups

OpenAI's $1.4 Trillion infrastructure bet

OpenAI is projecting $100B in revenue by 2027 but is committing a staggering $1.4 Trillion to infrastructure.

Federico Ulfo

Comment

Dec 17, 2025Models

Nano Banana Pro: turn earnings PDFs into infographics and slides

Nano Banana Pro: A standout tool this month for visuals. It can compress entire earnings PDFs into infographics, generate insights from papers, and create slides.

Federico Ulfo

Comment

Dec 17, 2025Research

SAM 3 (Segment Anything 3) released by Scale AI and Meta

SAM 3 (Segment Anything 3): Scale AI and Meta released SAM 3 for open-source image/video segmentation and 3D reconstruction.

Federico Ulfo

Comment

Dec 17, 2025Models

Intellect-3: 100B+ MoE trained with decentralized compute by Prime Intellect

Intellect-3: A 100B+ parameter MoE (Mixture of Experts) model released by Prime Intellect (PI), trained using decentralized computing. It shows state-of-the-art performance in math and code.

Federico Ulfo

Comment

Dec 17, 2025Models

NotebookLM gets Deep Research

NotebookLM got Deep Research

Federico Ulfo

Comment

Dec 17, 2025Agents

Poetiq AI Agent surpasses 50% at ARC-AGI-2 at ~$50/task

Poetiq AI Agent surpasses 50% at ARC-AGI-2, reaching superhuman performance at ~$50/task, half the cost of previous SOTA, suggesting agent scaffolding may be more important than raw model capability for certain reasoning tasks.

Federico Ulfo

Comment

Dec 17, 2025Fundraising & Startups

Disney invests $1B into OpenAI with 3-year Sora licensing deal

Disney invested $1B into OpenAI + 3-year licensing for Sora to use Disney/Marvel/Pixar/Star Wars characters, a gigantic signal about IP + AI video.

Federico Ulfo

Comment

Dec 17, 2025Models

OpenAI silently testing next-gen image backend "Image 2"

OpenAI is silently testing its next-gen image backend that people are informally calling "Image 2", allegedly considered in the same frontier tier as Nano Banana Pro.

Federico Ulfo

Comment

Dec 17, 2025Research

SimWorld: an open-ended simulator for agents in physical and social worlds

An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds. These researchers built a Tiny Economy in which different models, participating in a market economy, and challenges to make money, for example with food delivery. Claude and Qwen did pretty well taking a very risky approach, while other models played a more risk averse game, this caused great standard deviation in the results of Claude and Qwen but with high returns.

It's also hilarious to see how OpenAI lost their contract because Qwen and DeepSeek underbid them. https://simworld.org/

Federico Ulfo

Comment

Dec 17, 2025Videos & Podcasts

Ilya Sutskever on the Dwarkesh Podcast: "back at the age of research"

We are back at the age of research.

Federico Ulfo

Comment

Dec 17, 2025Videos & Podcasts

Podcast: Sakana – Continuous Thought Machines (CTM)

A new episode dives deep into Sakana AI's Continuous Thought Machines, exploring the underlying science and engineering of CTM and its parallels to biological neural timing and reasoning. If you're interested in the intersection of neuroscience and advanced AI research, this gives strong background and accessible explanations.

Federico Ulfo

Comment

Dec 17, 2025Videos & Podcasts

The Thinking Game: a documentary journey into DeepMind

A journey into the heart of DeepMind, capturing a team striving to unravel the mysteries of intelligence and life itself.

Federico Ulfo

Comment

Dec 17, 2025Videos & Podcasts

Jeff Dean on Important AI Trends (Stanford AI Club)

Jeff Dean (Google DeepMind, cofounder of Google Brain & TensorFlow) spoke at Stanford AI Club on the biggest shifts in AI: foundation models scaling, better hardware (TPUs), tool-using agents, multimodal models, and why responsible deployment and real-world feedback matter most.

Federico Ulfo

Comment

← NewerDecember 2025Older →

#AI Aperitivo 2.0

#AI Dinner 16.0

Search

AI Aperitivo 2.0

AI Dinner 16.0