HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference

Open in new window