Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models

Open in new window