MatFormer: Nested Transformer for Elastic Inference Devvrit Aditya Kusupati + Tim Dettmers

Neural Information Processing Systems 

Foundation models are applied in a broad spectrum of settings with different inference constraints, from massive multi-accelerator clusters to resource-constrained standalone mobile devices. However, the substantial costs associated with training these models often limit the number of unique model sizes that can be offered. Consequently, practitioners are compelled to select a model that may not be optimally aligned with their specific latency and cost requirements.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found