Faster Transformer Decoding: N-gram Masked Self-Attention

Open in new window