The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization

Neural Information Processing Systems 

In machine learning, it is often valuable for models to be parsimonious or sparse for a variety of reasons, from memory savings and computational speedups to model interpretability and general-izability.