Runtime Adaptive Pruning for LLM Inference