An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models