backpropagate through an equilibrium state of the network (which, to the best of our knowledge, no deep approaches
–Neural Information Processing Systems
We thank the reviewers for their valuable feedback. The way DEQ "ignores" depth and solves for the equilibrium suggests a different view of output modeling and further We also agree with the reviewers that the runtime discussion should be moved into the main text. We thank reviewer #1 for the valuable feedback. DEQ approach is very different from techniques like gradient checkpointing (GC). It is an implementation-based methodology that is practical on almost any layer-based network. Quantitatively, we have followed the reviewer's suggestion and compared GC and DEQ using a 70-layer TrellisNet (w/ We find that GC works best when we checkpoint after every 9 layers, and record a 5.2GB The training speed of GC is approximately 1.6 We thank reviewer #3 for the comments, and for taking the time to check our proof and read our code.
Neural Information Processing Systems
Nov-15-2025, 18:42:56 GMT