On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

Open in new window