Reviews: A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation
–Neural Information Processing Systems
The paper proposes a method that reduces memory consumption of back prop, and as a result allows the use of larger batch sizes. The authors state that this is significant e.g. with batch norm, where batch size matters. In my own experience, I believe this is also significant for improved GPU utilization and data-parallel training. The paper is original in that I have not seen a similar treatment (although the actual solution may have been relatively straight-forward; asking the right question was an important part of this). The paper is well written and easy to follow.
Neural Information Processing Systems
Jan-27-2025, 18:23:37 GMT