On the Expressivity Role of LayerNorm in Transformers' Attention

Open in new window