Updates — Voices from the AI Socratic Community

February 2025

Feb 21, 2025Models

xAI launches Grok 3

xAI launched Grok3 this week, it is an order of magnitude more capable than Grok 2, with 10x more computing power thanks to xAI's Colossus 100k H100s.

Grok 3 excels in math, science, coding, and general knowledge, with notable performance in image understanding tasks, achieving a 73.2% score on the MMMU benchmark (xAI Blog).

It's seems to be o1 level but it also introduced DeepResearch (they called it DeepSearch) and Search capabilities. The subscription costs $22/month. The most interesting feature of Grok is still the direct access to the twitter feed.

So while G3 is not full o3 level, which didn't launch yet, it signal xAI entering the arena in a competitive way.

https://x.com/adonis_singh/status/1892109817830851060

Federico Ulfo

Comment

Feb 12, 2025Models

OpenAI Launches O3, Operator, and DeepResearch

We're just at the beginning of the year and OpenAI launched 3 new products under the pro subscription for $200/month.

O3 is a new category of GPT models that score 87% on the ARC challenge. OpenAI released o3-mini and o3-mini-high for coding.
Operator is an AI agent mode that can be used with chatgpt-4o to use a desktop simulator and run actions that require browsing a web page and clicking links. Here's Karpathy's take on Operator.
DeepResearch enables running long research that collects content across multiple sources and summarizes it into a coherent report. It's a super powerful tool that has been received with a bang. It really makes the OpenAI pro subscription worth it. DeepResearch is currently the highest scoring in the Humanity's Last Exam.

https://x.com/tomaspueyo/status/1887270096013529530

Federico Ulfo

Comment

Feb 12, 2025Models

Gemini 2.0 Flash — Cheapest Model Yet

Google Released Gemini 2.0 Flash, their most impressive LLM yet. What set 2.0 Flash apart from other LLMs is the incredibly low cost and the ability to process PDFs. It cost only $0.40 per million tokens and has 1M-tokens context window, which means you can now parse 6000 long PDFs at near perfect quality for $1.

Flash 2.0 is the new king 👑 in the block

Gemini 2.0 Flash is Better ELO than DeepSeek r1 and cheaper.

Federico Ulfo

Comment

Feb 12, 2025Models

Mistral Ships Le Chat

Mistral just shipped Le Chat a competitor to ChatGPT that is 13x faster, 100% open-source, and completely free (vs $20/month).

https://x.com/itsolelehmann/status/1888290407127388497

Federico Ulfo

Comment

← NewerFebruary 2025Older →