Inoculation Prompting (IP)

October 15, 2025Posted by Federico Ulfo

The paper introduces a simple trick for SFT on flawed data: edit the training prompt to explicitly ask for the undesired behavior, then evaluate with a neutral or safety prompt.

https://x.com/saprmarks/status/1975989959153811954