The Bayesian Geometry of Transformer Attention

Open in new window