The Bayesian Geometry of Transformer Attention