D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models

Open in new window