Dropout Training as Adaptive Regularization Stefan Wager, Sida Wang, and Percy Liang Departments of Statistics

Neural Information Processing Systems 

Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization.