Why Do We Need Weight Decay in Modern Deep Learning?

Open in new window