Absolute Zero: Reinforced Self-Play Reasoning with Zero Data
Absolute Zero: Reinforced Self-Play Reasoning with Zero Data, AI learns to reason by inventing and solving its own Python coding challenges, using RL, no human data needed. Author explanation: https:/