The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models

Open in new window