Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning

Open in new window