ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs

Open in new window