Search across updates, events, members, and blog posts
The most important AI news and updates from last month (April 15 - May 15). A beefy month!
Let's start with events and conferences.
Another AI dinner at the Greycroft office. The focus this month will be on A2A, top 30 Ilya Sutskever papers, Alpha Evolve, and the latest in AI — we'll use this blog post you're reading right now to structure our conversations!

AI NYC is a community of AI researchers, engineers, and founders. We meet once a month in a Symposium running Socratic discussion around research papers, LLMs, and philosophizing around the latest in AI.
There are 2 major conferences happening next month, to decide which one to go I've scraped the speakers and events from both and put that in a google sheet, you can check it here: https://docs.google.com/spreadsheets/d/13Z2PKyFtbQaZm8iDguzxTZ4gukZnAI2BCg9-cVJFqfA/edit?gid=2008052934#gid=2008052934
I decided to go to SF this time around, the main reason is that the conference put together the best AI speakers all in one place, while AI Tech Week is scattered all over the place and has way too much noise.
OK, now let's dive into the latest from the month!
Google I/O 2025 doubled-down on Gemini-powered agents, AI-first Android, and a dash of new hardware—the clearest signal yet that “Google does the Googling” for you.

Google shipped really hard with this.
AlphaEvolve was received positively after its 14 May 2025 reveal. Powered by Gemini-2 models, the evolutionary coding agent discovers and refines algorithms that are already saving compute, speeding up hardware, and cracking open math problems.
AlphaEvolve is DeepMind’s boldest leap toward AI-driven scientific discovery—an agent that literally evolves code, freeing humans to focus on bigger ideas.
https://youtu.be/vC9nAosXrJw?si=pu3UjCJzJYImRgn-
In this episode of the Machine Learning Street Talk the team that worked on AlphaEvolve goes into the details of the breakthrough and their insights:
Codename codex-1, is a specialized evolution of our o3 reasoning model fine-tuned for software engineering tasks.
Launched as a research preview to gather feedback, prioritize safety, and iterate rapidly in one of the most competitive spaces—alongside GitHub Copilot, Google Gemini, Anthropic Claude, and emerging startups.
OpenAI’s vision is a unified developer experience where real-time pairing and asynchronous agent workflows converge—imagine editing code in your IDE, spawning Codex tasks on-demand, and receiving progress updates & results without context switching. LinkedIn
Codex is reportedly more capable at handling multi-step parallel coding tasks than standalone o3-based code models. In my experience, for quick suggestions Copilot still feels snappier, but Codex’s parallelism is unmatched when you need to orchestrate complex refactors & testing pipelines.
https://www.youtube.com/watch?time\_continue=6&v=wSAkqlzSZyw
https://x.com/t31kx/status/1921214839961067734


Top models according open router, notable how Gemini 2.5 is climbing the ladder, while anthropic 3.7 is slowly going down.

Companies are overfitting their model to the benchmarks. The @lmarena_ai has become the go-to evaluation for AI progress. Their last release demonstrates the difficulty in maintaining fair evaluations on @lmarena_ai, despite best intentions. Read more.

Benchamarks collection from Hugging Face
IQ bench changes in just one year. o3 has an IQ of 160 placing it in the top 100,000 smartest people in the world.

Continuous Thought Machines (CTM)
Sakana proposes a new neural architecture (CTM) built from the ground up to use neural dynamics as a core representation for intelligence. Using neural dynamics as a first class citizen, CTM shows some interesting emergent behavior. CTM are naturally easier to interpret.
CTM can decide to think less once it finds a pattern, using a process similar to how humans think, this enables to save energy.
https://x.com/hardmaru/status/1921751428508582329
OpenAI acquires Windsurf for $3B, completing the hilarious pattern of an Ouroboros. Are we in an AI bubble?

Insights on OpenAI buying Windsurf + appointing a CEO of applications: applications are becoming router of different models. This acquisition reduces the multiplexing to other models and the full vertical seamless integrations.
Insights on why the AI wave is different, Cursor's rise from $100m to $300m ARR in a few months, thesis for why:
Zeki Data Report shows that AI tools disrupting the traditional hiring. The below zero hiring this year, means we had more layoff than hires.

Another chart shows how the inflows / outflows of talents between the US / India is shifting the other way.

https://x.com/andrew\_n\_carr/status/1913603612430983665
Rich Sutton on AI alignment and Decentralization [15 min video]
"The short version is that I don't agree with AI-safety folks about what question we should be asking. Rather than asking how we can control the goals of the AIs, I think we should be asking how we can have a good future without controlling their goals (just as we have a pretty good present without controlling other peoples' goals)." - Richard Sutton
https://www.youtube.com/watch?v=Hnt-oBA086U&t=85s
https://x.com/goyal\_\_pramod/status/1921944575842644206
When to use an OpenAI model? Finally OpenAI published a guide that explains when to use which model. Very useful at least until GPT-5 is out we'll continue using more GPT models.



Vesuvius Challenge found the title of a scroll for the first time! This one was about "On Vices, Book 1" by Philodemus. Read more.

https://x.com/frantzfries/status/1920199640021971059
The intelligence Curse, in the April release of the Socratic AI we examined ai-2027.com and AI 2045. This blog post similarly to the others is an exploration of what's going to happen when AGI is here and how to avoid a disaster.
https://x.com/luke\_drago\_/status/1915376929542111353
GPT model stopped learning Croatian 🇭🇷, nobody could figure out why, turns out Croatian users (HRLF) were more prone to downvote messages. Lol. Read More.

TikTok, Google, Meta can run human experiments at scale, is that good or bad? Read any famous psychological experiment, sample size is 40 people, meanwhile ByteDance has a sample size of 2B people. Read more.

replace the loops with downloading the videos
https://x.com/kentskooking/status/1922570670132604967
https://x.com/kentskooking/status/1921464932119286053
Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.
Founder, Engineer
AI Socratic
Founder of AI Socratic
Top AI updates from Jan 15 to Feb 15 2026

Claude Code, Ralph Wiggum, DeepSeek mHC, Platonic Representation Hypothesis and more
The most important AI news and updates from last month: Nov 15 - Dec 15. GPT-5.2, Opus 4.5, Gemini 3, the Agentic IDE Wars, Genesis Mission, and more.