Mistral Releases Magistral RL Reasoning Model
June 16, 2025
The Mistral team at it again with Magistral! A reasoning model designed to excel in domain-specific, transparent, and multilingual reasoning.
GRPO with edits:
1. Removed KL Divergence 2. Normalize by total length (Dr. GRPO style) 3. Minibatch normalization for advantages 4. Relaxing trust region

https://arxiv.org/pdf/2506.10910
Simon Wilson: all LLM API vendors are converging to the same product:
Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.
Search across events, members, and blog posts
Comments
Sign in as a member to join the conversation.
Loading comments…