Limitations of Normalization in Attention Mechanism