AITopics | rematerialization

When training complex neural networks, memory usage can be an important bottleneck.

artificial intelligence, decomposition, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Efficient Combination of Rematerialization and Offloading for Training DNNs

Neural Information Processing SystemsFeb-11-2026, 03:26:24 GMT

Rematerialization and offloading are two well known strategies to save memory during the training phase of deep neural networks, allowing data scientists to consider larger models, batch sizes or higher resolution data.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

Coop: Memory is not a Commodity

Neural Information Processing SystemsDec-26-2025, 10:30:43 GMT

Tensor rematerialization allows the training of deep neural networks (DNNs) under limited memory budgets by checkpointing the models and recomputing the evicted tensors as needed. However, the existing tensor rematerialization techniques overlook the memory system in deep learning frameworks and implicitly assume that free memory blocks at different addresses are identical. Under this flawed assumption, discontiguous tensors are evicted, among which some are not used to allocate the new tensor. This leads to severe memory fragmentation and increases the cost of potential rematerializations.To address this issue, we propose to evict tensors within a sliding window to ensure all evictions are contiguous and are immediately used. Furthermore, we proposed cheap tensor partitioning and recomputable in-place to further reduce the rematerialization cost by optimizing the tensor allocation.We named our method Coop as it is a co-optimization of tensor allocation and tensor rematerialization. We evaluated Coop on eight representative DNNs. The experimental results demonstrate that Coop achieves up to $2\times$ memory saving and hugely reduces compute overhead, search latency, and memory fragmentation compared to the state-of-the-art baselines.

name change, rematerialization, tensor, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Efficient Rematerialization for Deep Networks

Ravi Kumar, Manish Purohit, Zoya Svitkina, Erik Vee, Joshua Wang

Neural Information Processing SystemsAug-20-2025, 11:28:40 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, decomposition, tree decomposition, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Mountain View (0.05)
Asia > Middle East > Jordan (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

ffe10334251de1dc98339d99ae4743ba-AuthorFeedback.pdf

Neural Information Processing SystemsAug-20-2025, 11:28:24 GMT

We thank the reviewers for their thoughtful comments. But consider the case of training BERT on a TPU pod, which takes around 4 days. We provide a formalization of the problem with rigorous guarantees. We now address a few of the specific reviewer concerns. However, in the revised version of this paper we will include a more thorough discussion of this. That post draws on Courcelle's theorem (namely, every graph property definable in the monadic second-order We feel that it's more accurate to avoid

algorithm, model parallelism, rematerialization, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

Efficient Combination of Rematerialization and Offloading for Training DNNs

Neural Information Processing SystemsAug-17-2025, 08:24:00 GMT

Rematerialization and offloading are two well known strategies to save memory during the training phase of deep neural networks, allowing data scientists to consider larger models, batch sizes or higher resolution data.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

Coop: Memory is not a Commodity

Neural Information Processing SystemsJan-19-2025, 16:46:57 GMT

Tensor rematerialization allows the training of deep neural networks (DNNs) under limited memory budgets by checkpointing the models and recomputing the evicted tensors as needed. However, the existing tensor rematerialization techniques overlook the memory system in deep learning frameworks and implicitly assume that free memory blocks at different addresses are identical. Under this flawed assumption, discontiguous tensors are evicted, among which some are not used to allocate the new tensor. This leads to severe memory fragmentation and increases the cost of potential rematerializations.To address this issue, we propose to evict tensors within a sliding window to ensure all evictions are contiguous and are immediately used. Furthermore, we proposed cheap tensor partitioning and recomputable in-place to further reduce the rematerialization cost by optimizing the tensor allocation.We named our method Coop as it is a co-optimization of tensor allocation and tensor rematerialization.

coop, rematerialization, tensor, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Moccasin: Efficient Tensor Rematerialization for Neural Networks

Bartan, Burak, Li, Haoming, Teague, Harris, Lott, Christopher, Dilkina, Bistra

arXiv.org Artificial IntelligenceMay-30-2023

The deployment and training of neural networks on edge computing devices pose many challenges. The low memory nature of edge devices is often one of the biggest limiting factors encountered in the deployment of large neural network models. Tensor rematerialization or recompute is a way to address high memory requirements for neural network training and inference. In this paper we consider the problem of execution time minimization of compute graphs subject to a memory budget. In particular, we develop a new constraint programming formulation called \textsc{Moccasin} with only $O(n)$ integer variables, where $n$ is the number of nodes in the compute graph. This is a significant improvement over the works in the recent literature that propose formulations with $O(n^2)$ Boolean variables. We present numerical studies that show that our approach is up to an order of magnitude faster than recent work especially for large-scale graphs.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2304.14463

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging

Patil, Shishir G., Jain, Paras, Dutta, Prabal, Stoica, Ion, Gonzalez, Joseph E.

arXiv.org Artificial IntelligenceJul-15-2022

Fine-tuning models on edge devices like mobile phones would enable privacy-preserving personalization over sensitive data. However, edge training has historically been limited to relatively small models with simple architectures because training is both memory and energy intensive. We present POET, an algorithm to enable training large neural networks on memory-scarce battery-operated edge devices. POET jointly optimizes the integrated search search spaces of rematerialization and paging, two algorithms to reduce the memory consumption of backpropagation. Given a memory budget and a run-time constraint, we formulate a mixed-integer linear program (MILP) for energy-optimal training. Our approach enables training significantly larger models on embedded devices while reducing energy consumption while not modifying mathematical correctness of backpropagation. We demonstrate that it is possible to fine-tune both ResNet-18 and BERT within the memory constraints of a Cortex-M class embedded device while outperforming current edge training methods in energy efficiency. POET is an open-source project available at https://github.com/ShishirPatil/poet

paging, poet, rematerialization, (16 more...)

arXiv.org Artificial Intelligence

2207.07697

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.41)

Industry:

Energy (1.00)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

Jain, Paras, Jain, Ajay, Nrusimha, Aniruddha, Gholami, Amir, Abbeel, Pieter, Keutzer, Kurt, Stoica, Ion, Gonzalez, Joseph E.

arXiv.org Machine LearningOct-7-2019

Modern neural networks are increasingly bottlenecked by the limited capacity of on-device GPU memory. Prior work explores dropping activations as a strategy to scale to larger neural networks under memory constraints. However, these heuristics assume uniform per-layer costs and are limited to simple architectures with linear graphs, limiting their usability. In this paper, we formalize the problem of trading-off DNN training time and memory requirements as the tensor rematerialization optimization problem, a generalization of prior checkpointing strategies. We introduce Checkmate, a system that solves for optimal schedules in reasonable times (under an hour) using off-the-shelf MILP solvers, then uses these schedules to accelerate millions of training iterations. Our method scales to complex, realistic architectures and is hardware-aware through the use of accelerator-specific, profile-based cost models. In addition to reducing training cost, Checkmate enables real-world networks to be trained with up to 5.1$\times$ larger input sizes.

batch size, graph, memory usage, (16 more...)

arXiv.org Machine Learning

1910.02653

Country: