Talking Heads: Understanding Inter-Layer Communication in Transformer Language Models

Open in new window