Attention and Compression is all you need for Controllably Efficient Language Models

Open in new window