Skip to main content
AI Socratic
April 2025
Models

OpenAI releases o3 and o4-mini reasoning models

OpenAI's just published o3 to all its customer and is a reasoning powerhouse! Initially teased under Project Strawberry, it outstrips GPT-4o with a 1M token context and top-tier logic skills.

Key Highlights

  • Reasoning Beast: 87.7% on GPQA Diamond, 71.7% on SWE-Bench Verified, 87.5% on ARC1 test.
  • Autonomous Tools: Search, Python, image generation for seamless problem-solving.
  • Affordable Mini: o3-mini delivers coding precision at lower costs.

Why It Rocks

  • Sharpened Logic: Precise, step-by-step reasoning for complex tasks.
  • Budget-Friendly: o3-mini makes elite AI accessible.
  • Tested & Polished: Community feedback shaped a stellar release.

Lot of tweets about it with positive feedback!

https://x.com/danshipper/status/1912551847056785841

Dropped with hype and minor scaling hiccups, o3 cements OpenAI’s lead in the AI race! o3-mini is also great apparently.

https://x.com/ren\_hongyu/status/1908035698579395066

It's great, but it still suffers from hallucination, apparently at this point more a feature than a bug of the transformer models.

https://x.com/TransluceAI/status/1912552046269771985

While not perfect, o3 makes a leaps of improvement on image recognition and understanding:

https://x.com/ErnestRyu/status/1913045962614087878

https://x.com/goodside/status/1912921153217118696

Federico UlfoFederico Ulfo
Models

OpenAI drops GPT-4.1 with Mini and Nano variants

OpenAI dropped GPT-4.1, stealing the spotlight with Mini and Nano variants. Initially teased as Quasar Alpha and Optimus Alpha on OpenRouter, this release outshines GPT-4o with a 1M token context and killer coding skills.

Critics are debating if 4.1 is a larger number than 4.5, depends if you read it as 4.10 or 4.1..

Joking aside is a good model and here's some info about it.

Key Highlights

  • Massive Context: 1M token window for deep, complex tasks.
  • Coding Prowess: Scores 55% on SWE-Bench Verified, no reasoning needed.
  • Affordable Power: $2/$8 per 1M tokens (input/output), with Mini at $0.40/$1.60.
  • Stealth Drop: OpenRouter’s Quasar and Optimus were GPT-4.1 testbeds, now retired.
  • Scalability Hiccups: High demand caused capacity concerns pre-launch.

How It Shines

  • Enhanced Instructions: Sharper, more precise responses.
  • Cost-Effective Variants: Mini and Nano make intelligence dirt cheap.
  • Collaborative Testing: OpenRouter’s alpha phase ensured a polished release.

Launched with buzz and a few scaling worries, GPT-4.1 is OpenAI’s bold step to dominate the AI race!

Federico UlfoFederico Ulfo
Models

Google Gemini 2.5 Pro launches with strong benchmarks

Gemini 2.5 was received positively. Codename nebula, has a context window of 1m tokens, 2m planned, it's multimodal. It's a reasoning model that is scoring high on the bench

Key Highlights

  • Massive Context: 1M token window, with 2M planned, for tackling huge tasks.
  • Coding Prowess: Scores 63.8% on SWE-Bench Verified, shines in web apps and code editing.
  • Multimodal Mastery: Handles text, audio, images, video, and code natively.
  • Stealth Drop: Experimental rollout in Gemini Advanced and Google AI Studio, now expanding.
  • Benchmark Dominance: Tops LMArena, AIME 2025, and GPQA without extra tricks.

How It Shines

  • Advanced Reasoning: Thinks before answering, delivering precise, context-aware responses.
  • Broad Access: Free experimental version and rollout to Gemini app users boost adoption.
  • Developer-Friendly: Large context reduces reliance on tools like RAG, streamlining workflows.

Launched with hype and a few pricing questions, Gemini 2.5 Pro is Google’s bold move to lead the AI frontier!

Gemini is reportedly better at DeepResearch than o3. My personal experience is that o3 deepresearch is still better, although xAI deepresearch is my favorite for Tweet research and summarization because faster.

https://x.com/daniel\_mac8/status/1909735258985316377

Federico UlfoFederico Ulfo

Search

Search across events, members, and blog posts