Does RL improve LLM reasoning? (NeurIPS 2025 top paper)

November 6, 2025Posted by Federico Ulfo

This paper got top score at NeurIPS 2025. It aims at answering: does RL make LLM better reasoners?

The authors study Reinforcement Learning with Verifiable Rewards (RLVR) and find that while it improves accuracy for small k, it doesn’t create new reasoning patterns — meaning the base model still determines the upper limit of reasoning ability.

Interestingly, it’s distillation, not RL, that shows genuine signs of emergent reasoning 😮.

link: x.com/jiqizhixin/status/1987710546674856051
web: limit-of-rlvr.github.io