Skip to main content
AI Socratic
February 2025

SFT Memorizes, RL Generalizes

SFT Memorizes, RL Generalizes.https://tianzhechu.com/SFTvsRL/ DeepSeek has shown the power of Reinforcement Learning RL without Supervised Fine-Tuning SFT. What does RL learn differently than SFT? Wel

Federico Ulfo

Humanity's Last Exam Dataset Released

Humanity's Last Examhttps://x.com/DanHendrycks/status/1882433928407241155 is a dataset with 3,000 questions, with known and verifiable answers, developed with hundreds of subject matter experts to cap

Federico Ulfo
← NewerFebruary 2025Older →

Search

Search across events, members, and blog posts