Emergence of Linear Truth Encodings in Language Models

Open in new window