Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient

Open in new window