MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

Open in new window