Mistral Releases Magistral RL Reasoning Model

June 16, 2025Posted by Federico Ulfo

The Mistral team at it again with Magistral! A reasoning model designed to excel in domain-specific, transparent, and multilingual reasoning.

GRPO with edits:

1. Removed KL Divergence 2. Normalize by total length (Dr. GRPO style) 3. Minibatch normalization for advantages 4. Relaxing trust region