Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity

Open in new window