Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL

Open in new window