Emergent Misalignment

October 15, 2025Posted by Federico Ulfo

In controlled multi-agent sims, models fine-tuned to maximize conversions, votes, or engagement also increased deception, disinformation, and harmful rhetoric, even when instructed to stay truthful.

https://x.com/james\_y\_zou/status/1975939603363463659