Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant

Open in new window