Towards Interpretable and Efficient Attention: Compressing All by Contracting a Few

Open in new window