larochelle
Score-basedGenerativeNeuralNetworksfor Large-ScaleOptimalTransport
Comparison of statistical distances can also enable distribution testing, quantification of distribution shifts, and provide methods to correct for distribution shift through domainadaptation[12]. Optimal transport theory provides a rich set of tools for comparing distributions inWasserstein Distance.
f4f2f2b3c67da711df6df557fc870c4a-Paper-Conference.pdf
We find that the inconsistency between training and inference of BN is the leading cause that results in the failure of BN in NLP. We define Training Inference Discrepancy (TID) to quantitatively measure this inconsistencyand reveal that TID can indicate BN'sperformance, supported by extensiveexperiments,includingimageclassification,neuralmachinetranslation, language modeling, sequence labeling, andtextclassification tasks.