SFT Memorizes, RL Generalizes
February 12, 2025
SFT Memorizes, RL Generalizes. DeepSeek has shown the power of Reinforcement Learning (RL) without Supervised Fine-Tuning (SFT). What does RL learn differently than SFT? Well, as the title, SFT memorizes, RL generalizes.
Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.
Search across events, members, and blog posts
Comments
Sign in as a member to join the conversation.
Loading comments…