LATTE: Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer

Open in new window