R-Drop: RegularizedDropoutforNeuralNetworks
–Neural Information Processing Systems
In this paper,we introduce asimple yet more effectivealternativeto regularize the training inconsistencyinduced bydropout, named asR-Drop. Concretely,ineachmini-batch training, eachdata sample goes through the forward pass twice, and each pass isprocessed by adifferent sub model by randomly dropping out some hidden units.
Neural Information Processing Systems
Feb-8-2026, 20:15:04 GMT