Skip to main content
AI Socratic
March 2025
Agents

Anthropic's Model Context Protocol (MCP) 🔥

It’s clear that AI is made out of hype cycles, and this time it’s MCP taking the stage. The Model Context Protocol (MCP) is a server/client protocol proposed by Anthropic to integrate large language models (LLMs) with external applications.

The adoption has started to pick up, with many big players jumping on board, including Perplexity AI.

The simplest way to understand MCP is to build a simple one—it takes 15 minutes at most, follow this guide or ask Claude to write it for you.

One of the most interesting project in the MCP world is OpenTools a registry to search MCPs.

Federico UlfoFederico Ulfo
Models

Anthropic Claude Sonnet 3.7 (Aurora) 🛸

Codename Aurora 🛸 🌌

On March 10, 2025, Anthropic introduced Claude Sonnet 3.7, dubbed "Aurora," marking it as their most sophisticated model yet. Designed with a focus on seamless human-like interaction, enhanced interpretability, and superior safety features, Anthropic continues to push boundaries in AI development. Alongside the release, they published a Transparency Report detailing safety protocols, performance metrics, and training insights.

Here are the highlights of Claude Sonnet 3.7:

Anthropic API Update

This is not the only update from Anthropic, they also just improved their API via prompt caching, reducing the amount of tokens and therefore inference cost 🔥!

Federico UlfoFederico Ulfo
Models

OpenAI releases GPT-4.5 (Orion) 🚀

Codename Orion 🚀 🌌

On February 27, 2025, OpenAI launched GPT-4.5, codenamed "Orion," as its most advanced model to date. GPT-4.5 focuses on natural conversation, reduced hallucinations, and improved general-purpose capabilities. OpenAI started using a System Card to share details including safety, benchmark, and training strategy.

Here are the highlights of GPT 4.5:

  • Performance: Excels in factual accuracy (PersonQA: 0.78) and multilingual tasks (MMLU English: 0.896), with a lower hallucination rate (0.19), helicone.ai.

  • New Base Model: Replaces previous models as the core for OpenAI’s reasoning systems, emphasizing natural dialogue.

  • Limitations: Strong in practical tasks but lags in complex reasoning compared to specialized models (vellum.ai)

  • Rollout: The rollout was staggered due to GPU shortages, with plans to expand access to other tiers.

Federico UlfoFederico Ulfo
Models

Google Flash 2.0 Image Generation, Gemma 3 27B

Google has been on a role! Releasing image generation for Gemini Flash 2.0 and releasing Gemma 3 27B, the new king in the small model ELO arena.

Google Flash 2.0 Image Generation

Launched March 12, 2025, Gemini 2.0 Flash by Google introduces native image generation within the model. It enables fast, context-aware image creation and editing, with a strong text rendering and storytelling consistency. We played with it, is quite fast and consistent, although in our experience the quality of the image is relatively low.

Gemma 3 27B — King of SLMs 👑

Released the same day, Gemma 3 27B is Google’s largest open-source model, built from Gemini research. With 27 billion parameters, it handles text and image inputs, supports a 128K token context, and excels in multilingual tasks and reasoning. Trained on 14T tokens, it runs efficiently on a single GPU and pairs with ShieldGemma 2 for safety. It’s ideal for developers seeking customizable, high-performance AI.

Federico UlfoFederico Ulfo
Models

Baidu ERNIE-4.5 — DeepSeek R1 level at half the price 👸

Another Small Language Model from the Chinese 🇨🇳 Baidu. This model is a DeepSeek R1 level but at half the cost.

The benchmark reported by Baidu shows that Ernie-4.5 beats 4o in multimodal capability and in some benchmark even gpt-4.5 in text capability — models today are getting overfitted for winning benchmark, running this model by hand generally shows a different story, and well, you can try it yourself here https://yiyan.baidu.com/ (just be aware that your data is getting collected).

Federico UlfoFederico Ulfo
Agents

Manus: DeepResearch + Operator agent from China

Meet Manus, the AI agent from China’s 🇨🇳 Butterfly Effect. Dubbed a 'DeepSeek moment,' it’s a fully autonomous tool that tackles real-world tasks, like building websites, analyzing stocks, and planning trips.

Powered by Claude 3.5 and Qwen, it got some attention on twitter. The GAIA benchmark shows some edge over OpenAI’s Deep Research.

We believe what's more important about this launch is that Chinese research labs are clearly here to innovate and lead, not just follow the US labs.

Federico UlfoFederico Ulfo
Research

The Ultra-Scale Playbook: Training LLMs on GPU Clusters 📈

The "Ultra-Scale Playbook," hosted on Hugging Face by Nanotron (a library for pretraining transformer models), is a comprehensive, open-source guide focused on training LLMs on large scale GPU clusters.

It serves as an educational resource for understanding and implementing advanced techniques to optimize LLM training at scale. This doc covers key concepts such as:

  • 5D parallelism: A framework combining data, tensor, pipeline, sequence, and zero-redundancy parallelism.

  • The ZeRO optimization technique: Designed to reduce memory redundancy and maximize GPU utilization.

  • Fast CUDA kernels for efficient computation, and strategies for overlapping compute and communication to address scaling bottlenecks.

It integrates theoretical explanations with practical insights, supported by interactive plots, over 4,000 scaling experiments, and audio summaries.

Aimed at both beginners and experts, it provides tools, code examples, and detailed discussions on GPU memory optimization and distributed training, making it a valuable resource for those looking to train massive AI models efficiently.

Saying that is fantastic is reductive, trust me!

https://huggingface.co/spaces/nanotron/ultrascale-playbook

Federico UlfoFederico Ulfo
Videos & Podcasts

ML Street Talk — Transformers Need Glasses!

ML Street Talk, is one of my new favorite AI podcast, incredible topic quality and guests.

Federico Barbero discusses why transformers struggle with tasks like counting and copying long text due to architectural bottlenecks and limitations in maintaining information fidelity. He draws comparisons to over-squashing in graph neural networks and highlights the role of the softmax function in these challenges, while also proposing practical modifications to improve transformer performance.

https://www.youtube.com/watch?v=FAspMnu4Rt0

Federico UlfoFederico Ulfo

Search

Search across events, members, and blog posts