Updates — Voices from the AI Socratic Community

Feed Mosaic Slides

February 2025

Feb 21, 2025Research

Scaling up test-time compute with latent reasoning

https://x.com/MatthewBerman/status/1890081482104008920

Federico Ulfo

Comment

Feb 12, 2025Research

SFT Memorizes, RL Generalizes

SFT Memorizes, RL Generalizes. DeepSeek has shown the power of Reinforcement Learning (RL) without Supervised Fine-Tuning (SFT). What does RL learn differently than SFT? Well, as the title, SFT memorizes, RL generalizes.

Federico Ulfo

Comment

Feb 12, 2025Research

As AIs Get Smarter, They Develop Coherent Value Systems

As AIs get smarter, they develop their own coherent value systems.

For example, they value human lives higher in order of Pakistan > India > China > US.

These are not just random biases, but internally consistent values that shape their behavior, with many implications for AI alignment, website link.

Federico Ulfo

Comment

Feb 12, 2025Research

Humanity's Last Exam Dataset Released

Humanity's Last Exam is a dataset with 3,000 questions, with known and verifiable answers, developed with hundreds of subject matter experts to capture the human frontier of knowledge and reasoning.

Federico Ulfo

Comment

Feb 12, 2025Research