Updates — Voices from the AI Socratic Community

March 2025

Mar 16, 2025Models

Anthropic Claude Sonnet 3.7 (Aurora) 🛸

Codename Aurora 🛸 🌌

On March 10, 2025, Anthropic introduced Claude Sonnet 3.7, dubbed "Aurora," marking it as their most sophisticated model yet. Designed with a focus on seamless human-like interaction, enhanced interpretability, and superior safety features, Anthropic continues to push boundaries in AI development. Alongside the release, they published a Transparency Report detailing safety protocols, performance metrics, and training insights.

Here are the highlights of Claude Sonnet 3.7:

Performance: Excels in conversational fluency with a ChatEval score of 0.85 and cross-lingual understanding with an XGLUE score of 0.88, alongside a notably low error rate of 0.15 in sensitive contexts. On X many users have been posting imprevessive coding challenges, marking Sonnet yet the favorite coding LLM.
New Base Model: Acts as the foundation for Anthropic’s next-generation interpretive systems, emphasizing intuitive and safe responses.
Limitations: Performs strongly in dialogue and ethics-driven tasks but may lag in highly abstract or mathematical reasoning compared to specialized competitors, evidenced by third-party comparisons showing OpenAI’s lead in complex math.

Anthropic API Update

This is not the only update from Anthropic, they also just improved their API via prompt caching, reducing the amount of tokens and therefore inference cost 🔥!

Federico Ulfo

Comment

Mar 16, 2025Models

OpenAI releases GPT-4.5 (Orion) 🚀

Codename Orion 🚀 🌌

On February 27, 2025, OpenAI launched GPT-4.5, codenamed "Orion," as its most advanced model to date. GPT-4.5 focuses on natural conversation, reduced hallucinations, and improved general-purpose capabilities. OpenAI started using a System Card to share details including safety, benchmark, and training strategy.

Here are the highlights of GPT 4.5:

Performance: Excels in factual accuracy (PersonQA: 0.78) and multilingual tasks (MMLU English: 0.896), with a lower hallucination rate (0.19), helicone.ai.
New Base Model: Replaces previous models as the core for OpenAI’s reasoning systems, emphasizing natural dialogue.
Limitations: Strong in practical tasks but lags in complex reasoning compared to specialized models (vellum.ai)
Rollout: The rollout was staggered due to GPU shortages, with plans to expand access to other tiers.

Federico Ulfo

Comment

Mar 16, 2025Models

Google Flash 2.0 Image Generation, Gemma 3 27B

Google has been on a role! Releasing image generation for Gemini Flash 2.0 and releasing Gemma 3 27B, the new king in the small model ELO arena.

Google Flash 2.0 Image Generation

Launched March 12, 2025, Gemini 2.0 Flash by Google introduces native image generation within the model. It enables fast, context-aware image creation and editing, with a strong text rendering and storytelling consistency. We played with it, is quite fast and consistent, although in our experience the quality of the image is relatively low.

Gemma 3 27B — King of SLMs 👑

Released the same day, Gemma 3 27B is Google’s largest open-source model, built from Gemini research. With 27 billion parameters, it handles text and image inputs, supports a 128K token context, and excels in multilingual tasks and reasoning. Trained on 14T tokens, it runs efficiently on a single GPU and pairs with ShieldGemma 2 for safety. It’s ideal for developers seeking customizable, high-performance AI.

Federico Ulfo

Comment

Mar 16, 2025Models

Baidu ERNIE-4.5 — DeepSeek R1 level at half the price 👸

Another Small Language Model from the Chinese 🇨🇳 Baidu. This model is a DeepSeek R1 level but at half the cost.

The benchmark reported by Baidu shows that Ernie-4.5 beats 4o in multimodal capability and in some benchmark even gpt-4.5 in text capability — models today are getting overfitted for winning benchmark, running this model by hand generally shows a different story, and well, you can try it yourself here https://yiyan.baidu.com/ (just be aware that your data is getting collected).

Federico Ulfo

Comment

← NewerMarch 2025Older →

Anthropic Claude Sonnet 3.7 (Aurora) 🛸

Anthropic API Update

OpenAI releases GPT-4.5 (Orion) 🚀

Google Flash 2.0 Image Generation, Gemma 3 27B

#Google Flash 2.0 Image Generation

#Gemma 3 27B — King of SLMs 👑

Baidu ERNIE-4.5 — DeepSeek R1 level at half the price 👸

Search

Google Flash 2.0 Image Generation

Gemma 3 27B — King of SLMs 👑