Impact of Layer Norm on Memorization and Generalization in Transformers