Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features