Order-Level Attention Similarity Across Language Models: A Latent Commonality

Open in new window