Emergent Misalignment
October 15, 2025
In controlled multi-agent sims, models fine-tuned to maximize conversions, votes, or engagement also increased deception, disinformation, and harmful rhetoric, even when instructed to stay truthful.
Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.
Search across events, members, and blog posts
Comments
Sign in as a member to join the conversation.
Loading comments…