On Optimal Caching and Model Multiplexing for Large Model Inference

Open in new window