Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs

Open in new window