Reviews: Deep Equilibrium Models
–Neural Information Processing Systems
Based on the authors response, I find the comparison against gradient checkpointing they provide satisfactory. Please ensure it is included in the final draft This work considers handling sequences of networks layers with identical weights (i.e. Instead of directly computing the sequence, a quasi-newton method is used to approximate the fixed point of the sequence. This has the advantage that the gradient has a simpler form, although one which must also be computed iteratively. The advantages are: • Much lower memory usage as intermediate tensors do not need to be stored for use in the backwards pass. Approximately 4-10x lower for the considered models.
Neural Information Processing Systems
Jan-21-2025, 04:37:23 GMT
- Technology: