R-Drop, a simple trick to improve DropOut

Oct-27-2021, 22:55:29 GMT–#artificialintelligence

In each training step, each data sample goes through a model twice. Each pass is processed by a different sub-model which is sampled by dropout. The two outputs distributions P_1(y x) and P_2(y x) are trained to be consistent by minimizing the bidirectional KL divergence between the 2 outputs. The final loss term is as the equation below, which combines the negative log-likelihood loss(cross-entropy) L_NLL and the bidirectional KL divergence L_KL. The KL divergence is measured on both sides: KL(P_1, P_2), KL(P_2, P_1) and the average is calculated.

dropout, r-drop, simple trick, (3 more...)

#artificialintelligence

Oct-27-2021, 22:55:29 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.38)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found