All the most important AI news and updates from last month (Feb 20 - Mar 15).
AI Dinner 8.0 — MCP Study Group
The AI Dinner is a recurring event we run with the AI NYC. This month it will be on Mar 19th, and it will be focused on AI engineering, and more specifically on MCP, here's the link https://lu.ma/mcp-study-group!
Anthropic's Model Context Protocol (MCP) 🔥
It’s clear that AI is made out of hype cycles, and this time it’s MCP taking the stage. The Model Context Protocol (MCP) is a server/client protocol proposed by Anthropic to integrate large language models (LLMs) with external applications.
The adoption has started to pick up, with many big players jumping on board, including Perplexity AI.
The simplest way to understand MCP is to build a simple one—it takes 15 minutes at most, follow this guide or ask Claude to write it for you.
One of the most interesting project in the MCP world is OpenTools a registry to search MCPs.
Anthropic Claude Sonnet 3.7
Codename Aurora 🛸 🌌
On March 10, 2025, Anthropic introduced Claude Sonnet 3.7, dubbed "Aurora," marking it as their most sophisticated model yet. Designed with a focus on seamless human-like interaction, enhanced interpretability, and superior safety features, Anthropic continues to push boundaries in AI development. Alongside the release, they published a Transparency Report detailing safety protocols, performance metrics, and training insights.
Here are the highlights of Claude Sonnet 3.7:
-
Performance: Excels in conversational fluency with a ChatEval score of 0.85 and cross-lingual understanding with an XGLUE score of 0.88, alongside a notably low error rate of 0.15 in sensitive contexts. On X many users have been posting imprevessive coding challenges, marking Sonnet yet the favorite coding LLM.
-
New Base Model: Acts as the foundation for Anthropic’s next-generation interpretive systems, emphasizing intuitive and safe responses.
-
Limitations: Performs strongly in dialogue and ethics-driven tasks but may lag in highly abstract or mathematical reasoning compared to specialized competitors, evidenced by third-party comparisons showing OpenAI’s lead in complex math.


Anthropic API Update
This is not the only update from Anthropic, they also just improved their API via prompt caching, reducing the amount of tokens and therefore inference cost 🔥!

OpenAI releases GPT 4.5
Codename Orion 🚀 🌌
On February 27, 2025, OpenAI launched GPT-4.5, codenamed "Orion," as its most advanced model to date. GPT-4.5 focuses on natural conversation, reduced hallucinations, and improved general-purpose capabilities. OpenAI started using a System Card to share details including safety, benchmark, and training strategy.
Here are the highlights of GPT 4.5:
-
Performance: Excels in factual accuracy (PersonQA: 0.78) and multilingual tasks (MMLU English: 0.896), with a lower hallucination rate (0.19), helicone.ai.
-
New Base Model: Replaces previous models as the core for OpenAI’s reasoning systems, emphasizing natural dialogue.
-
Limitations: Strong in practical tasks but lags in complex reasoning compared to specialized models (vellum.ai)
-
Rollout: The rollout was staggered due to GPU shortages, with plans to expand access to other tiers.

Google Flash 2.0 Image Generation, Gemma 3 27B
Google has been on a role! Releasing image generation for Gemini Flash 2.0 and releasing Gemma 3 27B, the new king in the small model ELO arena.
Google Flash 2.0 Image Generation
Launched March 12, 2025, Gemini 2.0 Flash by Google introduces native image generation within the model. It enables fast, context-aware image creation and editing, with a strong text rendering and storytelling consistency. We played with it, is quite fast and consistent, although in our experience the quality of the image is relatively low.
Gemma 3 27B — King of SLMs 👑
Released the same day, Gemma 3 27B is Google’s largest open-source model, built from Gemini research. With 27 billion parameters, it handles text and image inputs, supports a 128K token context, and excels in multilingual tasks and reasoning. Trained on 14T tokens, it runs efficiently on a single GPU and pairs with ShieldGemma 2 for safety. It’s ideal for developers seeking customizable, high-performance AI.


Baidu ERNIE-4.5 — DeepSeek R1 level at half the price👸
Another Small Language Model from the Chinese 🇨🇳 Baidu. This model is a DeepSeek R1 level but at half the cost.

The benchmark reported by Baidu shows that Ernie-4.5 beats 4o in multimodal capability and in some benchmark even gpt-4.5 in text capability — models today are getting overfitted for winning benchmark, running this model by hand generally shows a different story, and well, you can try it yourself here https://yiyan.baidu.com/ (just be aware that your data is getting collected).

Manus: DeepResearch + Operator
Meet Manus, the AI agent from China’s 🇨🇳 Butterfly Effect. Dubbed a 'DeepSeek moment,' it’s a fully autonomous tool that tackles real-world tasks, like building websites, analyzing stocks, and planning trips.
Powered by Claude 3.5 and Qwen, it got some attention on twitter. The GAIA benchmark shows some edge over OpenAI’s Deep Research.
We believe what's more important about this launch is that Chinese research labs are clearly here to innovate and lead, not just follow the US labs.

Same.dev — One shot clone *any website
We tried copy flowai.xyz but it had no success. Other website and demos are looking good thought, and people are already freaking out, searching solution to avoid AI copying their work.
How to defend from AI scrapers? Just add a hidden system prompt in your HTML, https://x.com/Erwin_AI/status/1900052620758467059.
https://x.com/aidenybai/status/1899840110449111416
The Ultra-Scale Playbook: Training LLMs on GPU Clusters CPU 📈
The "Ultra-Scale Playbook," hosted on Hugging Face by Nanotron (a library for pretraining transformer models), is a comprehensive, open-source guide focused on training LLMs on large scale GPU clusters.
It serves as an educational resource for understanding and implementing advanced techniques to optimize LLM training at scale. This doc covers key concepts such as:
-
5D parallelism: A framework combining data, tensor, pipeline, sequence, and zero-redundancy parallelism.
-
The ZeRO optimization technique: Designed to reduce memory redundancy and maximize GPU utilization.
-
Fast CUDA kernels for efficient computation, and strategies for overlapping compute and communication to address scaling bottlenecks.
It integrates theoretical explanations with practical insights, supported by interactive plots, over 4,000 scaling experiments, and audio summaries.
Aimed at both beginners and experts, it provides tools, code examples, and detailed discussions on GPU memory optimization and distributed training, making it a valuable resource for those looking to train massive AI models efficiently.
Saying that is fantastic is reductive, trust me!

https://huggingface.co/spaces/nanotron/ultrascale-playbook
Videos and Podcasts
This month we found many interesting videos across multiple topics, including philosophy, which in the field of AI is having a renaissance.
Deep Dive Into LLMs
This is a MUST watch for everyone who's learning LLMs. Adreji Karpathy explain LLMs from pre-training all the way to inference.
https://www.youtube.com/watch?v=7xTGNNLPyMI
Machine Learning Street Talk — Transformers Need Glasses!
ML Street Talk, is one of my new favorite AI podcast, incredible topic quality and guests.
Federico Barbero discusses why transformers struggle with tasks like counting and copying long text due to architectural bottlenecks and limitations in maintaining information fidelity. He draws comparisons to over-squashing in graph neural networks and highlights the role of the softmax function in these challenges, while also proposing practical modifications to improve transformer performance.
https://www.youtube.com/watch?v=FAspMnu4Rt0
Matt Segall — Is The Universe Conscious?
We just discovered this incredible podcast series by Curt Jaimungal, focused on theoretical physics, consciousness, AI, and God. In this episode Matthew Segall discuss the limits of current reality views, compare them to the outdated Ptolemaic model, and suggest that embracing mortality through introspection can deepen our understanding of existence.
https://youtu.be/DeTm4fSXpbM?si=ln2F4JBONSADK2kQ
Full Sources List
The list of news and updates is REALLY long. We only highlighted few of the updates and links we found interesting; I'm sure you'll find more interesting one in here.
Philosophy
- Michael Levin https://x.com/vitrupo/status/1900770734290731104
Updates
- Dario Amodei estimated the US needs 50GW new power for AI by 2027 https://x.com/_LouiePeters/status/1900888240497831976
- Andrew Barto and Richard Sutton have won the AM Turing Award for developing the theoretical foundation of RL https://x.com/QuantaMagazine/status/1897225605210411123
Robotics
- Otter, lightweight, easy to train model https://x.com/fangchenliu_/status/1900623471920738800
- Marathon robots https://x.com/TheHumanoidHub/status/1898773176483737913
- Figura https://x.com/adcock_brett/status/1895547297737662599
Research
- ⭐️ The Ultra-Scale Playbook: Training LLMs on GPU Clusters https://huggingface.co/spaces/nanotron/ultrascale-playbook
- Can AI stabilize the world faster than it disrupt it? In this paper we explore tools that would help most and not outpace broad AI progress:
https://x.com/LizkaVaintrob/status/1900627093962023388 - Anthropic, Auditing LLM for hidden objective. Deliberately trained a model with hidden misaligned objective and put researchers to test them https://x.com/AnthropicAI/status/1900217234825634236
- Search-R1: training LLMs to reason and leverage search engine with RL https://x.com/omarsar0/status/1900550994116960391
- Improving retrospective language agents via joint policy gradient optimization. Current agents rely on prompt-based methods or lack self-reflection after fine-tuning. This limits open-source models' performance and continuous learning. This paper introduces RetroAct, a framework for language agents that jointly optimizes planning and self-reflection. https://x.com/rohanpaul_ai/status/1898548039008317701
- Top papers of the Week
- (Mar 3-9): https://x.com/dair_ai/status/1898772469885862219
- (Feb 24 - Mar2): https://x.com/dair_ai/status/1896238160625172771
- (Feb 10-16): https://x.com/dair_ai/status/1893698281525407910
- Nous Research, proud to announce an open source implementation of NVIDIA nGPT paper https://x.com/NousResearch/status/1898073676433551630
- QwQ https://x.com/digiitaldream/status/1897595349981602044
- How well do LLMs compress their own CoT, a token complexity approach https://x.com/omarsar0/status/1896939453069074907
- This research examines higher-order topological dynamics, linking topology and nonlinear dynamics to understand complex systems like brains and AI, illustrated by a wireframe network https://x.com/Dragonmaurizio/status/1896211134589325700.
- ⭐️ Comprensive survey of post-training methods including fine-tuning, RL, and test-time scaling to refine LLMs reasoning https://x.com/rohanpaul_ai/status/1896880266251325656
- ⭐️ We trained a graph-native AI, then let it reason for days, forming a dynamic relational world model on its own. Emergent hubs, small-world properties, modularity, & scale-free structure arose naturally https://x.com/ProfBuehlerMIT/status/1893638938624979143
- A comprensive guide to explain AI, from classical models to LLMs https://x.com/predict_addict/status/1896186575714738256
- Atomic Thoughts, a plugin for any framework https://x.com/didiforx/status/1895902471635288252
- PlanGen: multiagent framework for generating planning and reasoning trajectory for complex problem solving https://x.com/dair_ai/status/1895532543652642850
- Visualizing LLMs complex joint beliefs https://x.com/DavidDuvenaud/status/1895139353804022036
- LIMO: less is more for reasoning. Challenging the notion that large datasets are necessary for AI to perform complex reasoning. https://x.com/rohanpaul_ai/status/1894171317030850848
- Cuda programming for Python Developers https://x.com/omarsar0/status/1892938951108751757
AI Builders
- Anthropic API updates https://x.com/AnthropicAI/status/1900234837283197122
- ⭐️ Manus
- Google data science agent https://x.com/mdancho84/status/1900571494561796351, https://x.com/googleaidevs/status/1896621600142733667
- 5 AI tools for developers https://x.com/unwind_ai_/status/1900412530343752112:
- trae.ai, IDE
- aider.chat, command line agent
- Qodo Gen, agentic AI for testing, reviewing and writing code
- Cline, collaborative AI partner
- CodeGPT, code assistant for IDEs
- Defend your website from tools like Same https://x.com/Erwin_AI/status/1900052620758467059
- How Windsurf IDE repo search work https://x.com/_mohansolo/status/1899630153636118529
- Composio https://composio.dev/
- Breaks down of every AI cursor rules in under 15min https://x.com/aaditsh/status/1898218010139144570
AI Tools
- ⭐️ Same.dev copy full website https://x.com/aidenybai/status/1899840110449111416
- Chinese AI startups have been cooking https://x.com/johnrushx/status/1898569435226738982
- Sesame, powerful AI voice https://x.com/emollick/status/1896757383566950466
- FLORA, intelligent canvas https://x.com/weberwongwong/status/1894794612398792974
Reports
- A16z Top 100 AI apps: https://x.com/omooretweets/status/1897686004640960562
Hardware
- Alex Cheema, Tinygrad + Macs, natural network framework https://x.com/alexocheema/status/1900684447281979889
AI Agents
- ⭐️ OpenAI Agents SDK, AI agent framework with built in tools, hand off tasks, safety guardrails, visualize execution traces https://x.com/OpenAIDevs/status/1899531857143972051
- AgentOps added a tracing tool for the AI Agents SDK https://x.com/AgentOpsAI/status/1899870669397205125
- Stripe, build financial agent with Agents SDK, to run autonomous invoicing https://x.com/jeff_weinstein/status/1899543525198614996
- Agents SDK clone of DeepResearch https://x.com/_rohanmehta/status/1899888529980698832
- IBM Research solving AI agent-to-agent communication with ACP (Agent Communication Protocol) https://x.com/armand_ruiz/status/1899158205454049311
- Crew AI new release https://x.com/joaomdmoura/status/1898750633374515571
- Maestro, first AI planning and orchestration system https://x.com/origoshen/status/1899196221304262790
- Vogent AI launched voice AI agents that design and improve themselves https://x.com/ycombinator/status/1899926060789358748
- Mastra AI, ts agent framework https://mastra.ai/ https://x.com/calcsam/status/1899203373687320944
- ANUS (Autonomous Networked Utility System) Agent framework https://x.com/GithubProjects/status/1900247632704135406
LLMs
- ⭐️ GPT 4.5 https://x.com/tunguz/status/1898722342974431609
- ⭐️ Google Gemini Flash 2.0 introduces Image Generation and https://x.com/mentzer_f/status/1900109407595487736
- Google just killed 100s of image generation tools https://x.com/VickyLovesAI/status/1900367589026779314
- ⭐️ Google DeepResearch available to all users + Gemma 3 27B + Flash Image generation https://x.com/OfficialLoganK/status/1900224377389465751
- ⭐️ Introducing Sonnet 3.7 https://x.com/alexalbert__/status/1894093648121532546
- Cohere launches Command A, a model for agentic enterprise functions https://x.com/cohere/status/1900170005519753365
- ⭐️ Google Gemma-3-27B, most powerful SLM yet https://x.com/kimmonismus/status/1900555013216231726
- Detailed analysis https://x.com/eliebakouch/status/1899790607993741603
- charts https://x.com/osanseviero/status/1899726995170210254
- Goose 1, first 0.1B reasoning model, pure RNN (attention free) https://x.com/BlinkDL_AI/status/1898579674575552558
- How transformer works https://x.com/khant_dev/status/1893172529730330938
- How gradient descent with momentum works https://x.com/verse_/status/1893131414914121825
MCPs ⭐️
- OpenTools, MCP registry https://x.com/opentools_/status/1900200185466163483, https://opentools.com/registry
- Perplexity AI now supports MCP https://x.com/AravSrinivas/status/1899850017546129445
- How to build MCP in 28 lines of code https://x.com/mattpocockuk/status/1898789901824590328, https://x.com/mattpocockuk/status/1898008951905526080
- MCP in 3 minutes https://x.com/mattppal/status/1898539018549362954
- Fleur, app store for MCP https://x.com/0xferruccio/status/1898429209388675554
- MCP fatal flaw is that it requires a stateful server https://x.com/jaredpalmer/status/1898048865758007771
- WTF is MCP? https://x.com/haydendevs/status/1898141128337023158, https://x.com/levelsio/status/1898139290053247167
- reject MCP, embrace SLOPs (Simple Language Open Protocol) https://x.com/NathanWilbanks_/status/1898142012991537520
- Original MCP guide https://x.com/omarsar0/status/1898082332474593499
Random
- Marc Benioff: I'll be the last CEO at Salesforce who only manages humans https://x.com/slow_developer/status/1900675808471011699
- Virtual girlfriend https://x.com/bnj/status/1900380801948336380
- Global intelligence is dropping https://x.com/_alice_evans/status/1900449985629487366
- What happened to GPT-5? https://x.com/0xIlyy/status/1899511253732897083
- ⭐️ Anthropic is hiring several software engineers (wait wasn't AI replacing them?) https://x.com/ThuleanFuturist/status/1899529749480788300
- First chip made out of carbon https://x.com/ShangguanJiewen/status/1898448633630241165
- t-5 months https://x.com/syang003/status/1898427121627959307
- AI machine that write homework with a pen https://x.com/CodeByPoonam/status/1897887776538083596
- How do you know DeepSeek are cracked https://x.com/bruce_x_offi/status/1893855098855436517
- How to do SEO for LLMs? https://x.com/marckohlbrugge/status/1892947603274293411
AGI
- ⭐️ Ethan Mollick "I believe now is the right time to start preparing for AGI", public is turning https://x.com/emollick/status/1900575976284660146
- OpenAI research, CoT models has allowed us to detect misbehaviors: https://x.com/OpenAI/status/1899143752918409338
- CoT is one of the few tools that will be able to supervising AGI https://x.com/Teknium1/status/1899244857820090514
- We're at an Oppenheimer moment https://x.com/vitrupo/status/1898378719783432281
- The nuclear-level risk of superintelligent AI https://x.com/ericschmidt/status/1897885202199859694
- "Bro, you're literally meat autocomplete" - gpt4.5 https://x.com/ESYudkowsky/status/1897776404579991953
- Biden's top AI advisor on national security side believes we're going to hit AGI during Trump's term https://x.com/ezraklein/status/1896937548209180959.
- AGI is going to be the most significant creation in human history https://x.com/flowersslop/status/1897039941521293725
- ARC AGI without pretraining https://x.com/LiaoIsaac91893/status/1896944891319742499
- Bryan Johnson, what people don't realize is that the sole purpose of don't die is AI alignement https://x.com/bryan_johnson/status/1895977030866649518
- Humans never genuinely pursue happiness, they pursue relief from uncertainty. Happiness emerges temporarily https://x.com/adonis_singh/status/1895274910085455965
- Eric Schmidt: "a modest death event (chernobyl-level) might be the trigger for public understanding of the risks" https://x.com/vitrupo/status/1900707085949825484
Lol Funzies
- AI logos https://x.com/alxfazio/status/1900207026153619652
- ANUS https://x.com/GithubProjects/status/1900336455953899610
- What Ilya saw https://x.com/deedydas/status/1900089048087089179
- Vibe coding https://x.com/hellwaiver/status/1899882561553207629
- Google hype vs delivery https://x.com/ai_for_success/status/1900576120401220065
- wake up -> new company aiming at building ASI https://x.com/Miles_Brundage/status/1898054075037729061
- America top models https://x.com/andykreed/status/1897404483203948894
- https://x.com/goodside/status/1894838590678389187
Geo politics
- DeepSeek forbid employees from traveling abroad https://x.com/ns123abc/status/1900568402722205989
- China trying to produce the entire semiconductor supply chain domestically
- OpenAI asks to ban DeepSeek https://x.com/SmokeAwayyy/status/1900237615967920251
- DeepSeek, Huawei, Export Controls, and the future of the US-China AI Race - long report https://x.com/Gregory_C_Allen/status/1898040379611504983
Youtube
- Dwarkesh podcast + Jo Henrich https://x.com/dwarkesh_sp/status/1899867891597791481
- It's like medieval kings going from bows & arrows to nuclear weapons https://x.com/robertwiblin/status/1899516235303301389
- WelshLab: Dark Matter of AI https://www.youtube.com/watch?v=UGO_Ehywuxc
- WelshLab: The genius of DeepSeek's 57X efficiency boost https://www.youtube.com/watch?v=0VLAoVGf_74
Security
- Undocumented commands founds in all bluetooth devices https://x.com/stevesi/status/1898915823709982824
Golden Nuggets
- Satya Nadella no more Model API, back to building products https://x.com/EMostaque/status/1898517130905313485
Benchmarks
- https://x.com/NeilMcDevitt_/status/1897097811080188227
- why do frontier labs celebrate 0.5% margins https://x.com/lateinteraction/status/1896682075585220737
Fundraising, Grants, and Programs
- Anthropic raises Series E, $3.5B at $61.5B https://x.com/deedydas/status/1896629701986447572
- NextGenAI: a consortium to advance research and education with $50m in funding and tools from OpenAI https://x.com/OpenAINewsroom/status/1896926174603030837
