DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies

Open in new window