Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Open in new window