Alignment faking in large language models

Open in new window