Scaling up test-time compute with latent reasoning
https://x.com/MatthewBerman/status/1890081482104008920
https://x.com/MatthewBerman/status/1890081482104008920
SFT Memorizes, RL Generalizes.https://tianzhechu.com/SFTvsRL/ DeepSeek has shown the power of Reinforcement Learning RL without Supervised Fine-Tuning SFT. What does RL learn differently than SFT? Wel
As AIs get smarter, they develop their own coherent value systemshttps://x.com/DanHendrycks/status/1889344074098057439. For example, they value human lives higher in order of Pakistan India China US.
Humanity's Last Examhttps://x.com/DanHendrycks/status/1882433928407241155 is a dataset with 3,000 questions, with known and verifiable answers, developed with hundreds of subject matter experts to cap
LLM beats doctor in treating patientshttps://x.com/emollick/status/1746022896508502138
New Distributed training paper from Google DeepMind https://x.com/osanseviero/status/1885301292131582347
Scaling through decentralization https://x.com/Ronangmi/status/1885373092777910749
Search across events, members, and blog posts