Grok 4 gets an AI companion
xAI just launched Grok 4. The xAI benchmark showed it as a new SOTA model, but twitter accounts showed a different story. Some of the highlights include: 100× more training than Grok 2 and 10× more RL
xAI just launched Grok 4. The xAI benchmark showed it as a new SOTA model, but twitter accounts showed a different story. Some of the highlights include: 100× more training than Grok 2 and 10× more RL
OpenAI is having a rough time lately, as they kept on losing key researcher to Meta and Google. Especially missing out on the Windsurf acquisition. Google actually is acquiring Windsurf, but the new m
Kimi 2 is a new open source model from Moonshot, that uses a similar architecture of DeepSeek V3, with fewer heads, and more experts. It's really cheap and fast, taking SOTA position on several benchm
First ever AI Code CLI battle royal: claude-code, anon-kode, codex, opencode, ampcode, gemini. https://x.com/SIGKITTEN/status/1937950811910234377https://x.com/SIGKITTEN/status/1937950811910234377 Goog
Avoids using predefined vocabs and memory-heavy embedding tables. Instead, it uses Autoregressive U-Nets to embed information directly from raw bytes. This enables infinite vocab size and more. https:
Pfizer researchers argue that what looks like a collapse in AI reasoning may actually be an Agentic gap — models failing not in thought, but in action. When given tools, the same models crushed tasks
The paper documents a pattern they called Potemkins, a kind of reasoning inconsistency see figure below. They show that LLMs - even models like o3 — make these errors frequently. Gary Marcus: "You can
https://x.com/karpathy/status/1935518272667217925 https://x.com/dwarkesh\sp/status/1938271893406310818
Search across events, members, and blog posts