AITopics | zero-infinity

Collaborating Authors

zero-infinity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

10Cache: Heterogeneous Resource-Aware Tensor Caching and Migration for LLM Training

Afroz, Sabiha, Khan, Redwan Ibne Seraj, Albahar, Hadeel, Han, Jingoo, Butt, Ali R.

arXiv.org Artificial IntelligenceNov-19-2025

Training large language models (LLMs) in the cloud faces growing memory bottlenecks due to the limited capacity and high cost of GPUs. While GPU memory offloading to CPU and NVMe has made large-scale training more feasible, existing approaches suffer from high tensor migration latency and suboptimal device memory utilization, ultimately increasing training time and cloud costs. To address these challenges, we present 10Cache, a resource-aware tensor caching and migration system that accelerates LLM training by intelligently coordinating memory usage across GPU, CPU, and NVMe tiers. 10Cache profiles tensor execution order to construct prefetch policies, allocates memory buffers in pinned memory based on tensor size distributions, and reuses memory buffers to minimize allocation overhead. Designed for cloud-scale deployments, 10Cache improves memory efficiency and reduces reliance on high-end GPUs. Across diverse LLM workloads, it achieves up to 2x speedup in training time, improves GPU cache hit rate by up to 86.6x, and increases CPU/GPU memory utilization by up to 2.15x and 1.33x, respectively, compared to state-of-the-art offloading methods. These results demonstrate that 10Cache is a practical and scalable solution for optimizing LLM training throughput and resource efficiency in cloud environments.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.14124

Country:

Europe (0.28)
North America > United States (0.16)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Microsoft's ZeRO-Infinity Library Trains 32 Trillion Parameter AI Model

#artificialintelligenceJun-25-2021, 03:10:47 GMT

Microsoft recently announced ZeRO-Infinity, an addition to their open-source DeepSpeed AI training library that optimizes memory use for training very large deep-learning models. Using ZeRO-Infinity, Microsoft trained a model with 32 trillion parameters on a cluster of 32 GPUs, and demonstrated fine-tuning of a 1 trillion parameter model on a single GPU. The DeepSpeed team described the new features in a recent blog post. ZeRO-Infinity is the latest iteration of the Zero Redundancy Optimizer (ZeRO) family of memory optimization techniques. ZeRO-Infinity introduces several new strategies for addressing memory and bandwidth constraints when training large deep-learning models, including: a new offload engine for exploiting CPU and Non-Volatile Memory express (NVMe) memory, memory-centric tiling to handle large operators without model-parallelism, bandwidth-centric partitioning for reducing bandwidth costs, and an overlap-centric design for scheduling data communication.

trillion parameter ai model, zero-infinity, zero-infinity library train 32, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning

Rajbhandari, Samyam, Ruwase, Olatunji, Rasley, Jeff, Smith, Shaden, He, Yuxiong

arXiv.org Artificial IntelligenceApr-15-2021

In the last three years, the largest dense deep learning models have grown over 1000x to reach hundreds of billions of parameters, while the GPU memory has only grown by 5x (16 GB to 80 GB). Therefore, the growth in model scale has been supported primarily though system innovations that allow large models to fit in the aggregate GPU memory of multiple GPUs. However, we are getting close to the GPU memory wall. It requires 800 NVIDIA V100 GPUs just to fit a trillion parameter model for training, and such clusters are simply out of reach for most data scientists. In addition, training models at that scale requires complex combinations of parallelism techniques that puts a big burden on the data scientists to refactor their model. In this paper we present ZeRO-Infinity, a novel heterogeneous system technology that leverages GPU, CPU, and NVMe memory to allow for unprecedented model scale on limited resources without requiring model code refactoring. At the same time it achieves excellent training throughput and scalability, unencumbered by the limited CPU or NVMe bandwidth. ZeRO-Infinity can fit models with tens and even hundreds of trillions of parameters for training on current generation GPU clusters. It can be used to fine-tune trillion parameter models on a single NVIDIA DGX-2 node, making large models more accessible. In terms of training throughput and scalability, it sustains over 25 petaflops on 512 NVIDIA V100 GPUs(40% of peak), while also demonstrating super linear scalability. An open source implementation of ZeRO-Infinity is available through DeepSpeed, a deep learning optimization library that makes distributed training easy, efficient, and effective.

bandwidth, parallelism, zero-infinity, (15 more...)

arXiv.org Artificial Intelligence

2104.07857

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre:

Research Report (1.00)
Overview (0.66)

Industry: Information Technology (0.89)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback