DropDim: A Regularization Method for Transformer Networks

Zhang, Hao, Qu, Dan, Shao, Keji, Yang, Xukui

Apr-20-2023–arXiv.org Artificial Intelligence

We introduceDropDim, a structured dropout method designed for regularizing the self-attention mechanism, which is a key component of the transformer. In contrast to the general dropout method, which randomly drops neurons, DropDim drops part of the embedding dimensions. In this way, the semantic information can be completely discarded. Thus, the excessive coadapting between different embedding dimensions can be broken, and the self-attention is forced to encode meaningful featureswith a certain number of embedding dimensions erased. Experiments on a wide range of tasks executed on the MUST-C English-Germany dataset show that DropDim can effectively improve model performance, reduce over-fitting, and show complementary effects with other regularization methods. When combined with label smoothing, the WER can be reduced from 19.1% to 15.1% on the ASR task, and the BLEU value can be increased from26.90 to 28.38 on the MT task. On the ST task, the model can reach a BLEU score of 22.99, an increase by 1.86 BLEU points compared to the strong baseline.

dropdim, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Apr-20-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Germany (0.24)
- Asia > China
  - Henan Province > Zhengzhou (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Machine Translation (0.95)
  - Machine Learning > Neural Networks (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found