Analyzing the Structure of Attention in a Transformer Language Model

Open in new window