LatentQA: Teaching LLMs to Decode Activations Into Natural Language

Open in new window