Reviews: The Reversible Residual Network: Backpropagation Without Storing Activations

Neural Information Processing Systems 

The authors introduce "RevNets", which avoid storing (some) activations by utilizing computational blocks that are trivial to invert (i.e. Revnets match the performance of ResNets with the same number of parameters, and in practice RevNets appear to save 4X in storage at the cost of a 2X increase in computation. Interestingly, the reversible blocks are also volume preserving, which is not explicitly discussed, but should be, because this is a potential limitation. The approach of reconstructing activations rather than storing them is only applicable to invertible layers, and so while requiring only O(1) storage for invertible layers, succeeds in only a 4X gain in storage requirements (which is nevertheless impressive). One concern I have is that the recent work on decoupled neural interfaces (DNI) is not adequately discussed or compared to (DNI also requires O(1) storage, and estimates error signals [and optionally input values] analogously to how value functions are learned in reinforcement learning).