Inoculation Prompting (IP)
October 15, 2025
The paper introduces a simple trick for SFT on flawed data: edit the training prompt to explicitly ask for the undesired behavior, then evaluate with a neutral or safety prompt.
Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.
Search across events, members, and blog posts
Comments
Sign in as a member to join the conversation.
Loading comments…