Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference

Open in new window