Sparse is Enough in Scaling Transformers University of Warsaw Google Research Google Research OpenAI Wojciech Gajewski Henryk Michalewski Jonni Kanerva Google Research

Open in new window