Sparse Modular Activation for Efficient Sequence Modeling Yang Liu 2 Shuohang Wang

Open in new window