Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking