AI Socratic May 2025

May 20, 2025Updated February 2, 20269 min read

The most important AI news and updates from last month (April 15 - May 15). A beefy month!

Let's start with events and conferences.

May 21st, AI Dinner 10.0

Another AI dinner at the Greycroft office. The focus this month will be on A2A, top 30 Ilya Sutskever papers, Alpha Evolve, and the latest in AI — we'll use this blog post you're reading right now to structure our conversations!

AI NYC is a community of AI researchers, engineers, and founders. We meet once a month in a Symposium running Socratic discussion around research papers, LLMs, and philosophizing around the latest in AI.

Conferences

There are 2 major conferences happening next month, to decide which one to go I've scraped the speakers and events from both and put that in a google sheet, you can check it here: https://docs.google.com/spreadsheets/d/13Z2PKyFtbQaZm8iDguzxTZ4gukZnAI2BCg9-cVJFqfA/edit?gid=2008052934#gid=2008052934

I decided to go to SF this time around, the main reason is that the conference put together the best AI speakers all in one place, while AI Tech Week is scattered all over the place and has way too much noise.

AI Engineer, SF, June 3-5, https://www.ai.engineer. Here's a discount code for you THANKSFEULF.
AI Tech Week, NY, June 2-8, https://www.tech-week.com.

OK, now let's dive into the latest from the month!

Google IO

Google I/O 2025 doubled-down on Gemini-powered agents, AI-first Android, and a dash of new hardware—the clearest signal yet that “Google does the Googling” for you.

Key Highlights

AI Mode for Search – rolls out to all U.S. users, running dozens of Gemini-driven sub-queries and soon tapping Project Mariner to carry out up to 10 web tasks with a Teach-and-Repeat workflow.
Gemini 2.5 & Chrome – Deep-Think reasoning mode lands for complex math/code, while Gemini comes native to Chrome for tab-wide summarization and navigation.
Imagen 4, Veo 3 & Flow – next-gen image and video models plus the Flow AI filmmaking app let creators stitch 8-second clips into longer AI movies.
Project Astra upgrades – the multimodal agent goes proactive with Search Live, speaking up unprompted and handling tasks as you point your camera.
Android 16 preview – Material 3 Expressive redesign, AI weather-reactive wallpapers, scam-call shields, Private Space, and system-wide Gemini hooks.
Wear OS 6 – gets the same Material 3 flair, adaptive circular UI and a 10 % battery bump for Pixel Watch and beyond.
Project Aura XR glasses – Xreal partnership teases wide-FOV smart glasses with on-device Gemini assistance.

Google shipped really hard with this.

AlphaEvolve 🦠

AlphaEvolve was received positively after its 14 May 2025 reveal. Powered by Gemini-2 models, the evolutionary coding agent discovers and refines algorithms that are already saving compute, speeding up hardware, and cracking open math problems.

Key Highlights

Superhuman Algorithms – beats the 56-year-old Strassen method for 4 × 4 complex matrix multiplication.
Compute Savings – a new Borg scheduling heuristic recovers ≈ 0.7 % of Google’s global compute fleet.
Hardware & Training Boosts – 23 % faster Gemini kernel (1 % shorter training) and 32.5 % FlashAttention speed-up; lean Verilog redesign ships in next-gen TPUs.
Evolutionary Engine – pairs Gemini Flash for breadth with Gemini Pro for depth, guided by automated evaluators.
Broad Discovery – improved 20 % of 50 + open math problems and rediscovered 75 % of known best results.
Early Access – academic EAP sign-ups open, wider rollout under exploration.

How It Shines

Provably Novel – solutions are mathematically verified as new, not memorized.
Real-World Impact – live in data-centers, chip design, and LLM training pipelines today.
Engineer-Friendly – outputs human-readable code, easing adoption and debugging.
Open Horizons – same framework targets materials science, drug discovery, sustainability, and more.

AlphaEvolve is DeepMind’s boldest leap toward AI-driven scientific discovery—an agent that literally evolves code, freeing humans to focus on bigger ideas.

https://youtu.be/vC9nAosXrJw?si=pu3UjCJzJYImRgn-

In this episode of the Machine Learning Street Talk the team that worked on AlphaEvolve goes into the details of the breakthrough and their insights:

Tweets

Codex

Codename codex-1, is a specialized evolution of our o3 reasoning model fine-tuned for software engineering tasks.

Key Highlights

Parallel Tasking: Executes writing features, bug fixes, tests, and codebase queries concurrently in isolated cloud sandboxes, with tasks completing in 1–30 minutes.
Verifiable Actions: Every task provides terminal logs, test outputs, and change citations for transparent review & integration.
Configurable Agent Behavior: Use AGENTS.md files to instruct Codex on codebase navigation, testing commands, and project conventions.

How It Shines

Coding Proficiency: Excels on internal SWE-Bench Verified evaluations, delivering clean, review-ready patches. LinkedIn
Autonomous Collaboration: Proposes pull requests, refactors large codebases, and answers complex code queries independently. WSJ @EconomicTimes
Security-Focused: Runs within sandboxed containers with no internet access during execution and is trained to refuse malicious software requests. @EconomicTimes
Async Workflow Revolution: Shifts development from linear task queues to parallel AI-driven task delegation, keeping engineers in flow longer. Medium

Launched as a research preview to gather feedback, prioritize safety, and iterate rapidly in one of the most competitive spaces—alongside GitHub Copilot, Google Gemini, Anthropic Claude, and emerging startups.

OpenAI’s vision is a unified developer experience where real-time pairing and asynchronous agent workflows converge—imagine editing code in your IDE, spawning Codex tasks on-demand, and receiving progress updates & results without context switching. LinkedIn

Codex is reportedly more capable at handling multi-step parallel coding tasks than standalone o3-based code models. In my experience, for quick suggestions Copilot still feels snappier, but Codex’s parallelism is unmatched when you need to orchestrate complex refactors & testing pipelines.

https://www.youtube.com/watch?time\_continue=6&v=wSAkqlzSZyw

"Your App Is Just A ChatGPT Wrapper" they said!

https://x.com/t31kx/status/1921214839961067734

Updates

The pope actually choose the name Leo XIV because of AI: https://x.com/VaticanNews/status/1921186921838997935.
Sam Altman and Jony Ive hint on a new personal AI product: https://x.com/sama/status/1925242282523103408.

LLM Models Vibe Check & Benchmarks

Top models according open router, notable how Gemini 2.5 is climbing the ladder, while anthropic 3.7 is slowly going down.

Companies are overfitting their model to the benchmarks. The @lmarena_ai has become the go-to evaluation for AI progress. Their last release demonstrates the difficulty in maintaining fair evaluations on @lmarena_ai, despite best intentions. Read more.

Benchamarks collection from Hugging Face

IQ bench changes in just one year. o3 has an IQ of 160 placing it in the top 100,000 smartest people in the world.

Research Papers

Absolute Zero: Reinforced Self-Play Reasoning with Zero Data, AI learns to reason by inventing and solving its own Python coding challenges, using RL, no human data needed. Author explanation: https://x.com/\_AndrewZhao/status/1919920459748909288.
Flow-GRPO
ZeroSearch: incentivizing search in LLMs without searching. ZeroSearch is a curriculum-based RL framework that teaches LLMs to retrieve information using self-generated documents: https://x.com/omarsar0/status/1920469148968362407

Continuous Thought Machines (CTM)

Sakana proposes a new neural architecture (CTM) built from the ground up to use neural dynamics as a core representation for intelligence. Using neural dynamics as a first class citizen, CTM shows some interesting emergent behavior. CTM are naturally easier to interpret.

CTM can decide to think less once it finds a pattern, using a process similar to how humans think, this enables to save energy.

https://x.com/hardmaru/status/1921751428508582329

VC and Fundraising

OpenAI acquires Windsurf for $3B, completing the hilarious pattern of an Ouroboros. Are we in an AI bubble?

Insights on OpenAI buying Windsurf + appointing a CEO of applications: applications are becoming router of different models. This acquisition reduces the multiplexing to other models and the full vertical seamless integrations.

Insights on why the AI wave is different, Cursor's rise from $100m to $300m ARR in a few months, thesis for why:

AI wave is different than the cloud wave
AI is being bought, while SaaS was being sold, AI products are pulled so they grow faster.
Data advantage might be the only/ ultimate moat in AI, github copilot had the data, distribution, resources, advantage, but Cursor is still winning due to better UX.

Zeki Data Report shows that AI tools disrupting the traditional hiring. The below zero hiring this year, means we had more layoff than hires.

Another chart shows how the inflows / outflows of talents between the US / India is shifting the other way.

How LLM do arithmetics — lol

https://x.com/andrew\_n\_carr/status/1913603612430983665

Videos and Podcasts

Rich Sutton on AI alignment and Decentralization [15 min video]
"The short version is that I don't agree with AI-safety folks about what question we should be asking. Rather than asking how we can control the goals of the AIs, I think we should be asking how we can have a good future without controlling their goals (just as we have a pretty good present without controlling other peoples' goals)." - Richard Sutton

https://www.youtube.com/watch?v=Hnt-oBA086U&t=85s

Random Thoughts And Updates

https://x.com/goyal\_\_pramod/status/1921944575842644206

When to use an OpenAI model? Finally OpenAI published a guide that explains when to use which model. Very useful at least until GPT-5 is out we'll continue using more GPT models.

OpenAI's New Roadmap — AI for Education

Vesuvius Challenge found the title of a scroll for the first time! This one was about "On Vices, Book 1" by Philodemus. Read more.

https://x.com/frantzfries/status/1920199640021971059

The intelligence Curse, in the April release of the Socratic AI we examined ai-2027.com and AI 2045. This blog post similarly to the others is an exploration of what's going to happen when AGI is here and how to avoid a disaster.

https://x.com/luke\_drago\_/status/1915376929542111353

GPT model stopped learning Croatian 🇭🇷, nobody could figure out why, turns out Croatian users (HRLF) were more prone to downvote messages. Lol. Read More.

TikTok, Google, Meta can run human experiments at scale, is that good or bad? Read any famous psychological experiment, sample size is 40 people, meanwhile ByteDance has a sample size of 2B people. Read more.

AI For Builders

Json uses much more tokens than alternatives solutions. Read more.

Loops you can take home to your mother

replace the loops with downloading the videos

https://x.com/kentskooking/status/1922570670132604967

https://x.com/kentskooking/status/1921464932119286053

Full Source List

Stay Updated

Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.

About the Authors

Federico Ulfo

Founder, Engineer

New York City

AI Socratic July 2026 — Lost In J-Space

Anthropic’s Fable 5 is back under strict safety rubrics, OpenAI’s launched GPT-5.6, Meta launched Muse Spark 1.1 model and Meta Compute.

AI Socratic June 2026 #2 — Begun the Open Source AI War Has

The second half of June was about AI climbing out of the chat box and into the physical world: Midjourney started scanning bodies, Snap shipped a face computer, SpaceX bought Cursor, and Sakana built a model to command other models. Underneath it all, Dwarkesh Patel named the real bottleneck — the world refuses to be grindable.

AI Socratic June 2026 - Hoist by Its Own Fable

Anthropic shipped Claude Fable 5, its first public Mythos-class model, and 72 hours later a national-security directive pulled it offline worldwide. A company that spent the month lobbying to keep frontier AI pausable got its own pause, on schedule. Around it: new models from nearly everyone, a couple of S-1s, real math from the machines, and the usual carnival of vibe-coding pivots and rogue Waymos.

AI Socratic May 2025

#May 21st, AI Dinner 10.0

#Conferences

#Google IO

#Key Highlights

#AlphaEvolve 🦠

#Key Highlights

#How It Shines

Tweets

#Codex

#Key Highlights

#How It Shines

#"Your App Is Just A ChatGPT Wrapper" they said!

#Updates

#LLM Models Vibe Check & Benchmarks

#Research Papers

#VC and Fundraising

#How LLM do arithmetics — lol

#Videos and Podcasts

#Random Thoughts And Updates

#AI For Builders

#Loops you can take home to your mother

Stay Updated

About the Authors

Federico Ulfo

Related Posts

AI Socratic July 2026 — Lost In J-Space

AI Socratic June 2026 #2 — Begun the Open Source AI War Has

AI Socratic June 2026 - Hoist by Its Own Fable

May 21st, AI Dinner 10.0

Conferences

Google IO

Key Highlights

AlphaEvolve 🦠

Key Highlights

How It Shines

Codex

Key Highlights

How It Shines

"Your App Is Just A ChatGPT Wrapper" they said!

Updates

LLM Models Vibe Check & Benchmarks

Research Papers

VC and Fundraising

How LLM do arithmetics — lol

Videos and Podcasts

Random Thoughts And Updates

AI For Builders

Loops you can take home to your mother