Measuring the Mixing of Contextual Information in the Transformer

Open in new window