SpeedLoader: An I/O efficient scheme for heterogeneous and distributed LLM operation

Mar-19-2026, 17:03:53 GMT–Neural Information Processing Systems

With the surging growth of model parameters, foundation models pose unprecedented challenges to traditional computational infrastructures. These large models inherently require substantial accelerator memory to accommodate massive tensors during pre-training, fine-tuning, and even inference stages, making it even more challenging to deploy a model with restricted computational resources. Given this challenge, distribution and offloading the model states are two major solutions. Partitioning the required states to participating workers, and storing them in lower speed media, such as host DRAM and block devices, largely alleviate the accelerator memory pressure. However, the prohibitive costs of tensor communication render it a theoretically plausible yet practically inefficient solution.

artificial intelligence, machine learning, natural language, (6 more...)

Neural Information Processing Systems

Mar-19-2026, 17:03:53 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.39)
  - Natural Language (0.35)