Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks

Open in new window