On Convergence and Generalization of Dropout Training

Neural Information Processing Systems 

We study dropout in two-layer neural networks with rectified linear unit (ReLU) activations.