Towards Optimal Caching and Model Selection for Large Model Inference

Open in new window