Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data
Hedderich, Michael A., Klakow, Dietrich
Manually labeled corpora are expensive to create and often not available for low-resource languages or domains. Automatic labeling approaches are an alternative way to obtain labeled data in a quicker and cheaper way. However, these labels often contain more errors which can deteriorate a classifier's performance when trained on this data. We propose a noise layer that is added to a neural network architecture. This allows modeling the noise and train on a combination of clean and noisy data. We show that in a low-resource NER task we can improve performance by up to 35% by using additional, noisy data and handling the noise.
Jul-2-2018
- Country:
- Europe
- United Kingdom (0.04)
- France (0.04)
- Germany > Saarland
- Saarbrücken (0.04)
- Europe
- Genre:
- Research Report (0.82)
- Technology: