AITopics | r-drop

Collaborating Authors

r-drop

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

R-Drop: RegularizedDropoutforNeuralNetworks

Neural Information Processing SystemsFeb-8-2026, 20:15:04 GMT

In this paper,we introduce asimple yet more effectivealternativeto regularize the training inconsistencyinduced bydropout, named asR-Drop. Concretely,ineachmini-batch training, eachdata sample goes through the forward pass twice, and each pass isprocessed by adifferent sub model by randomly dropping out some hidden units.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

R-Drop: Regularized Dropout for Neural Networks

Neural Information Processing SystemsDec-24-2025, 04:16:09 GMT

Dropout is a powerful and widely used technique to regularize the training of deep neural networks. Though effective and performing well, the randomness introduced by dropout causes unnegligible inconsistency between training and inference. In this paper, we introduce a simple consistency training strategy to regularize dropout, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. Specifically, for each training sample, R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models sampled by dropout. Theoretical analysis reveals that R-Drop reduces the above inconsistency. Experiments on $\bf{5}$ widely used deep learning tasks ($\bf{18}$ datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classification, show that R-Drop is universally effective. In particular, it yields substantial improvements when applied to fine-tune large-scale pre-trained models, e.g., ViT, RoBERTa-large, and BART, and achieves state-of-the-art (SOTA) performances with the vanilla Transformer model on WMT14 English$\to$German translation ($\bf{30.91}$

name change, r-drop, regularized dropout, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Add feedback

Optimal Multi-Task Learning at Regularization Horizon for Speech Translation Task

Jung, JungHo, Lee, Junhyun

arXiv.org Artificial IntelligenceSep-15-2025

End-to-end speech-to-text translation typically suffers from the scarcity of paired speech-text data. One way to overcome this shortcoming is to utilize the bitext data from the Machine Translation (MT) task and perform Multi-Task Learning (MTL). In this paper, we formulate MTL from a regularization perspective and explore how sequences can be regularized within and across modalities. By thoroughly investigating the effect of consistency regularization (different modality) and R-drop (same modality), we show how they respectively contribute to the total regularization. We also demonstrate that the coefficient of MT loss serves as another source of regularization in the MTL setting. With these three sources of regularization, we introduce the optimal regularization contour in the high-dimensional space, called the regularization horizon. Experiments show that tuning the hyperparameters within the regularization horizon achieves near state-of-the-art performance on the MuST-C dataset.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.09701

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

5a66b9200f29ac3fa0ae244cc2a51b39-Paper.pdf

Neural Information Processing SystemsAug-14-2025, 16:29:05 GMT

arxiv preprint arxiv, dropout, r-drop, (15 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)

Add feedback

Appendix for "R-Drop: Regularized Dropout for Neural Networks "

Neural Information Processing SystemsMay-28-2025, 23:28:41 GMT

We provide more detailed settings for the experiments of each task in this part. A.1 Neural Machine Translation For all the NMT tasks, we use the public datasets from IWSLT competitions After tokenization, the resulted vocabularies for IWSLT datasets are near 10k, while for WMT datasets, the vocabulary size is about 32k. To train the Transformer based NMT models, we use transformer_iwslt_de_en configuration for IWSLT translations, which has 6 layers in both encoder and decoder, embedding size 512, feed-forward size 1, 024, attention heads 4, dropout value 0.3, weight decay 0.0001. Label smoothing [12] is adopted with value 0.1. To evaluate the performance, we use multi-bleu.perl

r-drop, transformer, translation, (16 more...)

Neural Information Processing Systems

Country: Oceania > Australia > New South Wales > Sydney (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

R-Drop: Regularized Dropout for Neural Networks

Neural Information Processing SystemsOct-10-2024, 14:09:34 GMT

Dropout is a powerful and widely used technique to regularize the training of deep neural networks. Though effective and performing well, the randomness introduced by dropout causes unnegligible inconsistency between training and inference. In this paper, we introduce a simple consistency training strategy to regularize dropout, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. Specifically, for each training sample, R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models sampled by dropout. Theoretical analysis reveals that R-Drop reduces the above inconsistency. Experiments on \bf{5} widely used deep learning tasks ( \bf{18} datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classification, show that R-Drop is universally effective.

neural network, r-drop, regularized dropout, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Text Augmentations with R-drop for Classification of Tweets Self Reporting Covid-19

Francis, Sumam, Moens, Marie-Francine

arXiv.org Artificial IntelligenceNov-6-2023

This paper presents models created for the Social Media Mining for Health 2023 shared task. Our team addressed the first task, classifying tweets that self-report Covid-19 diagnosis. Our approach involves a classification model that incorporates diverse textual augmentations and utilizes R-drop to augment data and mitigate overfitting, boosting model efficacy. Our leading model, enhanced with R-drop and augmentations like synonym substitution, reserved words, and back translations, outperforms the task mean and median scores. Our system achieves an impressive F1 score of 0.877 on the test set.

diagnosis, r-drop, tweet, (13 more...)

arXiv.org Artificial Intelligence

2311.0342

Country: Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.80)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

R-Drop, a simple trick to improve DropOut

#artificialintelligenceOct-27-2021, 22:55:29 GMT

In each training step, each data sample goes through a model twice. Each pass is processed by a different sub-model which is sampled by dropout. The two outputs distributions P_1(y x) and P_2(y x) are trained to be consistent by minimizing the bidirectional KL divergence between the 2 outputs. The final loss term is as the equation below, which combines the negative log-likelihood loss(cross-entropy) L_NLL and the bidirectional KL divergence L_KL. The KL divergence is measured on both sides: KL(P_1, P_2), KL(P_2, P_1) and the average is calculated.

dropout, r-drop, simple trick, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.38)

Add feedback

The Data of 2030

#artificialintelligenceJul-13-2021, 01:20:55 GMT

There is a natural synergy (yes, we're using that word) among the many subcategories that make up the AI world. It would be impossible to talk about synthetic data without talking about machine learning, computer vision, software, ethics, privacy (and neural rendering, and GANs, and our Marketing and Sales director Michael's daughter's book Neural Networks for Babies) -- so that's why we don't do that. But synthetic data remains the apple of our eye. So we were thrilled to discover that Gartner Inc.'s June Report predicts that by 2030, the most used type of data in AI will be synthetic. Modernization can be a tricky thing, especially when it requires industry-wide adjustments.

machine learning, prostate cancer, synthetic data, (13 more...)

#artificialintelligence

Country: