This paper shows that a single simple RL recipe can push 1.5B models to SoTA reasoning with half the compute

This paper shows that a single simple RL recipe can push 1.5B models to SoTA reasoning with half the compute - Updates | AI Socratic

Search