Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals

Open in new window