LLM Cache Bandit Revisited: Addressing Query Heterogeneity for Cost-Effective LLM Inference

Open in new window