Goto

Collaborating Authors

 normalization



Nearly Optimal Bounds for Cyclic Forgetting

Neural Information Processing Systems

One challenge of continual learning is "catastrophic forgetting" [Had+20; VT19; Kem+18]: A model However, if contexts similar to A arise repeatedly, this may be undesirable.. In machine learning, many data sets display cyclic or periodic patterns.


The Crucial Role of Normalization in Sharpness-Aware Minimization Yan Dai

Neural Information Processing Systems

Sharpness-A ware Minimization (SAM) is a recently proposed gradient-based optimizer (Foret et al., ICLR 2021) that greatly improves the prediction performance of deep neural networks.