Updates — Voices from the AI Socratic Community | AI Socratic

Feed Mosaic Slides

May 2025

May 20, 2025Random

AI Dinner 10.0 at Greycroft, May 21st

Another AI dinner at the Greycroft office. The focus this month will be on A2A, top 30 Ilya Sutskever papers, Alpha Evolve, and the latest in AI — we'll use this blog post you're reading right now to structure our conversations!

AI NYC is a community of AI researchers, engineers, and founders. We meet once a month in a Symposium running Socratic discussion around research papers, LLMs, and philosophizing around the latest in AI.

Federico Ulfo

Federico Ulfo

May 20, 2025Random

Two major June AI conferences: AI Engineer SF vs AI Tech Week NY

There are 2 major conferences happening next month, to decide which one to go I've scraped the speakers and events from both and put that in a google sheet, you can check it here: https://docs.google.com/spreadsheets/d/13Z2PKyFtbQaZm8iDguzxTZ4gukZnAI2BCg9-cVJFqfA/edit?gid=2008052934#gid=2008052934

I decided to go to SF this time around, the main reason is that the conference put together the best AI speakers all in one place, while AI Tech Week is scattered all over the place and has way too much noise.

AI Engineer, SF, June 3-5, https://www.ai.engineer. Here's a discount code for you THANKSFEULF.
AI Tech Week, NY, June 2-8, https://www.tech-week.com.

Federico Ulfo

Federico Ulfo

May 20, 2025Models

Google I/O 2025: Gemini agents, AI-first Android, and new hardware

Google I/O 2025 doubled-down on Gemini-powered agents, AI-first Android, and a dash of new hardware—the clearest signal yet that “Google does the Googling” for you.

Key Highlights

AI Mode for Search – rolls out to all U.S. users, running dozens of Gemini-driven sub-queries and soon tapping Project Mariner to carry out up to 10 web tasks with a Teach-and-Repeat workflow.
Gemini 2.5 & Chrome – Deep-Think reasoning mode lands for complex math/code, while Gemini comes native to Chrome for tab-wide summarization and navigation.
Imagen 4, Veo 3 & Flow – next-gen image and video models plus the Flow AI filmmaking app let creators stitch 8-second clips into longer AI movies.
Project Astra upgrades – the multimodal agent goes proactive with Search Live, speaking up unprompted and handling tasks as you point your camera.
Android 16 preview – Material 3 Expressive redesign, AI weather-reactive wallpapers, scam-call shields, Private Space, and system-wide Gemini hooks.
Wear OS 6 – gets the same Material 3 flair, adaptive circular UI and a 10 % battery bump for Pixel Watch and beyond.
Project Aura XR glasses – Xreal partnership teases wide-FOV smart glasses with on-device Gemini assistance.

Google shipped really hard with this.

Federico Ulfo

Federico Ulfo

May 20, 2025Agents

DeepMind AlphaEvolve: evolutionary coding agent discovers new algorithms

AlphaEvolve was received positively after its 14 May 2025 reveal. Powered by Gemini-2 models, the evolutionary coding agent discovers and refines algorithms that are already saving compute, speeding up hardware, and cracking open math problems.

Key Highlights

Superhuman Algorithms – beats the 56-year-old Strassen method for 4 × 4 complex matrix multiplication.
Compute Savings – a new Borg scheduling heuristic recovers ≈ 0.7 % of Google’s global compute fleet.
Hardware & Training Boosts – 23 % faster Gemini kernel (1 % shorter training) and 32.5 % FlashAttention speed-up; lean Verilog redesign ships in next-gen TPUs.
Evolutionary Engine – pairs Gemini Flash for breadth with Gemini Pro for depth, guided by automated evaluators.
Broad Discovery – improved 20 % of 50 + open math problems and rediscovered 75 % of known best results.
Early Access – academic EAP sign-ups open, wider rollout under exploration.

How It Shines

Provably Novel – solutions are mathematically verified as new, not memorized.
Real-World Impact – live in data-centers, chip design, and LLM training pipelines today.
Engineer-Friendly – outputs human-readable code, easing adoption and debugging.
Open Horizons – same framework targets materials science, drug discovery, sustainability, and more.

AlphaEvolve is DeepMind’s boldest leap toward AI-driven scientific discovery—an agent that literally evolves code, freeing humans to focus on bigger ideas.

https://youtu.be/vC9nAosXrJw?si=pu3UjCJzJYImRgn-

In this episode of the Machine Learning Street Talk the team that worked on AlphaEvolve goes into the details of the breakthrough and their insights:

Tweets

Federico Ulfo

Federico Ulfo

May 20, 2025Agents

OpenAI Codex: cloud SWE agent built on codex-1 reasoning model

Codename codex-1, is a specialized evolution of our o3 reasoning model fine-tuned for software engineering tasks.

Key Highlights

Parallel Tasking: Executes writing features, bug fixes, tests, and codebase queries concurrently in isolated cloud sandboxes, with tasks completing in 1–30 minutes.
Verifiable Actions: Every task provides terminal logs, test outputs, and change citations for transparent review & integration.
Configurable Agent Behavior: Use AGENTS.md files to instruct Codex on codebase navigation, testing commands, and project conventions.

How It Shines

Coding Proficiency: Excels on internal SWE-Bench Verified evaluations, delivering clean, review-ready patches. LinkedIn
Autonomous Collaboration: Proposes pull requests, refactors large codebases, and answers complex code queries independently. WSJ @EconomicTimes
Security-Focused: Runs within sandboxed containers with no internet access during execution and is trained to refuse malicious software requests. @EconomicTimes
Async Workflow Revolution: Shifts development from linear task queues to parallel AI-driven task delegation, keeping engineers in flow longer. Medium

Launched as a research preview to gather feedback, prioritize safety, and iterate rapidly in one of the most competitive spaces—alongside GitHub Copilot, Google Gemini, Anthropic Claude, and emerging startups.

OpenAI’s vision is a unified developer experience where real-time pairing and asynchronous agent workflows converge—imagine editing code in your IDE, spawning Codex tasks on-demand, and receiving progress updates & results without context switching. LinkedIn

Codex is reportedly more capable at handling multi-step parallel coding tasks than standalone o3-based code models. In my experience, for quick suggestions Copilot still feels snappier, but Codex’s parallelism is unmatched when you need to orchestrate complex refactors & testing pipelines.

https://www.youtube.com/watch?time\_continue=6&v=wSAkqlzSZyw

"Your App Is Just A ChatGPT Wrapper" they said!

https://x.com/t31kx/status/1921214839961067734

Federico Ulfo

Federico Ulfo

May 20, 2025Random

Pope Leo XIV chose his name because of AI

The pope actually choose the name Leo XIV because of AI: https://x.com/VaticanNews/status/1921186921838997935.

Federico Ulfo

Federico Ulfo

May 20, 2025Random

Sam Altman and Jony Ive hint at a new personal AI product

Sam Altman and Jony Ive hint on a new personal AI product: https://x.com/sama/status/1925242282523103408.

Federico Ulfo

Federico Ulfo

May 20, 2025Models

LLM Models Vibe Check & Benchmarks: OpenRouter, lmarena, and IQ

Top models according open router, notable how Gemini 2.5 is climbing the ladder, while anthropic 3.7 is slowly going down.

Companies are overfitting their model to the benchmarks. The @lmarena_ai has become the go-to evaluation for AI progress. Their last release demonstrates the difficulty in maintaining fair evaluations on @lmarena_ai, despite best intentions. Read more.

Benchamarks collection from Hugging Face

IQ bench changes in just one year. o3 has an IQ of 160 placing it in the top 100,000 smartest people in the world.

Federico Ulfo

Federico Ulfo

May 20, 2025Research

Absolute Zero: Reinforced Self-Play Reasoning with Zero Data

Absolute Zero: Reinforced Self-Play Reasoning with Zero Data, AI learns to reason by inventing and solving its own Python coding challenges, using RL, no human data needed. Author explanation: https://x.com/\_AndrewZhao/status/1919920459748909288.

Federico Ulfo

Federico Ulfo

May 20, 2025Research

Flow-GRPO

Flow-GRPO

Federico Ulfo

Federico Ulfo

May 20, 2025Research

ZeroSearch: incentivizing search in LLMs without searching

ZeroSearch: incentivizing search in LLMs without searching. ZeroSearch is a curriculum-based RL framework that teaches LLMs to retrieve information using self-generated documents: https://x.com/omarsar0/status/1920469148968362407

Federico Ulfo

Federico Ulfo

May 20, 2025Research

Sakana's Continuous Thought Machines (CTM) architecture

Continuous Thought Machines (CTM)

Sakana proposes a new neural architecture (CTM) built from the ground up to use neural dynamics as a core representation for intelligence. Using neural dynamics as a first class citizen, CTM shows some interesting emergent behavior. CTM are naturally easier to interpret.

CTM can decide to think less once it finds a pattern, using a process similar to how humans think, this enables to save energy.

https://x.com/hardmaru/status/1921751428508582329

Federico Ulfo

Federico Ulfo

May 20, 2025Fundraising & Startups

OpenAI acquires Windsurf for $3B

OpenAI acquires Windsurf for $3B, completing the hilarious pattern of an Ouroboros. Are we in an AI bubble?

Insights on OpenAI buying Windsurf + appointing a CEO of applications: applications are becoming router of different models. This acquisition reduces the multiplexing to other models and the full vertical seamless integrations.

Federico Ulfo

Federico Ulfo

May 20, 2025Fundraising & Startups

Why the AI wave is different: Cursor's rise from $100M to $300M ARR

Insights on why the AI wave is different, Cursor's rise from $100m to $300m ARR in a few months, thesis for why:

AI wave is different than the cloud wave
AI is being bought, while SaaS was being sold, AI products are pulled so they grow faster.
Data advantage might be the only/ ultimate moat in AI, github copilot had the data, distribution, resources, advantage, but Cursor is still winning due to better UX.

Federico Ulfo

Federico Ulfo

May 20, 2025Macro & Geopolitics

Zeki Data Report: US to become a net exporter of AI talent in 2025

Zeki Data Report shows that AI tools disrupting the traditional hiring. The below zero hiring this year, means we had more layoff than hires.

Another chart shows how the inflows / outflows of talents between the US / India is shifting the other way.

Federico Ulfo

Federico Ulfo

May 20, 2025Research

How LLMs do arithmetic

How LLM do arithmetics — lol

https://x.com/andrew\_n\_carr/status/1913603612430983665

Federico Ulfo

Federico Ulfo

May 20, 2025Videos & Podcasts

Rich Sutton on AI alignment and Decentralization [15 min video]

Rich Sutton on AI alignment and Decentralization [15 min video]
"The short version is that I don't agree with AI-safety folks about what question we should be asking. Rather than asking how we can control the goals of the AIs, I think we should be asking how we can have a good future without controlling their goals (just as we have a pretty good present without controlling other peoples' goals)." - Richard Sutton

https://www.youtube.com/watch?v=Hnt-oBA086U&t=85s

Federico Ulfo

Federico Ulfo

May 20, 2025Random

Random thought tweet from @goyal__pramod

https://x.com/goyal\_\_pramod/status/1921944575842644206

Federico Ulfo

Federico Ulfo

May 20, 2025Models

OpenAI publishes a guide on when to use which model

When to use an OpenAI model? Finally OpenAI published a guide that explains when to use which model. Very useful at least until GPT-5 is out we'll continue using more GPT models.

OpenAI's New Roadmap — AI for Education

Federico Ulfo

Federico Ulfo

May 20, 2025Research

Vesuvius Challenge finds a scroll title for the first time

Vesuvius Challenge found the title of a scroll for the first time! This one was about "On Vices, Book 1" by Philodemus. Read more.

https://x.com/frantzfries/status/1920199640021971059

Federico Ulfo

Federico Ulfo

May 20, 2025Philosophy & Ethics

The Intelligence Curse: exploring how to avoid an AGI disaster

The intelligence Curse, in the April release of the Socratic AI we examined ai-2027.com and AI 2045. This blog post similarly to the others is an exploration of what's going to happen when AGI is here and how to avoid a disaster.

https://x.com/luke\_drago\_/status/1915376929542111353

Federico Ulfo

Federico Ulfo

May 20, 2025Fun

GPT model stopped learning Croatian due to downvoting users

GPT model stopped learning Croatian 🇭🇷, nobody could figure out why, turns out Croatian users (HRLF) were more prone to downvote messages. Lol. Read More.

Federico Ulfo

Federico Ulfo

May 20, 2025Philosophy & Ethics

TikTok, Google, Meta can run human experiments at scale

TikTok, Google, Meta can run human experiments at scale, is that good or bad? Read any famous psychological experiment, sample size is 40 people, meanwhile ByteDance has a sample size of 2B people. Read more.

Federico Ulfo

Federico Ulfo

May 20, 2025Vibe Coding

JSON uses many more tokens than alternative formats

Json uses much more tokens than alternatives solutions. Read more.

Federico Ulfo

Federico Ulfo

May 20, 2025Fun

Loops you can take home to your mother

replace the loops with downloading the videos

https://x.com/kentskooking/status/1922570670132604967

https://x.com/kentskooking/status/1921464932119286053

Federico Ulfo

Federico Ulfo

← NewerMay 2025Older →