Regularizing Recurrent Neural Networks via Sequence Mixup

Karamzade, Armin, Najafi, Amir, Motahari, Seyed Abolfazl

arXiv.org Machine Learning 

Recurrent neural networks are the basis of the state-of-the-art models in natural language processing, including language modeling (Mikolov et al., 2011), machine translation (Cho et al., 2014) and named entity recognition (Lample et al., 2016). It is needless to say that complex learning tasks require relatively large networks with millions of parameters to be accomplished. However, large neural networks need more data and/or strong regularization techniques to be trained successfully and avoid overfitting. Without the means to collect more data, which is the case in the majority of real-world problems, data augmentation and regularization methods are standard alternative practices to overcome this barrier. Data augmentation in natural language processing is limited, and often task-specific (Kobayashi, 2018; Kafle et al., 2017). On the other hand, adopting the same regularization methods that are originally proposed for feed-forward (non-recurrent) networks needs to be done with extra care to avoid hurting the network's information flow between consecutive time-steps. To overcome such limitations, we present Sequence Mixup: a set of training methods, regularization techniques, and data augmentation procedures for RNNs. Sequence Mixup can be considered as the RNN-generalization of input mixup (Zhang et al., 2017) and manifold mixup (Verma et al., 2018), which are already introduced for feed-forward neural

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found