Does RL improve LLM reasoning? (NeurIPS 2025 top paper)
November 6, 2025

This paper got top score at NeurIPS 2025. It aims at answering: does RL make LLM better reasoners?
The authors study Reinforcement Learning with Verifiable Rewards (RLVR) and find that while it improves accuracy for small k, it doesn’t create new reasoning patterns — meaning the base model still determines the upper limit of reasoning ability.
Interestingly, it’s distillation, not RL, that shows genuine signs of emergent reasoning 😮.
link: x.com/jiqizhixin/status/1987710546674856051
web: limit-of-rlvr.github.io
Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.
Search across events, members, and blog posts
Comments
Sign in as a member to join the conversation.
Loading comments…