Accelerating Transformer Inference for Translation via Parallel Decoding

Open in new window