When Attention Sink Emerges in Language Models: An Empirical View