Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient