On Optimal Caching and Model Multiplexing for Large Model Inference

Neural Information Processing Systems 

By combining a caching algorithm, namely Greedy Dual Size with Frequency (GDSF) or Least Expected Cost (LEC), with a model multiplexer, we achieve optimal rates in both offline and online settings.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found