Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

Open in new window