Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data

Michael A. Hedderich, Dietrich Klakow


Abstract
Manually labeled corpora are expensive to create and often not available for low-resource languages or domains. Automatic labeling approaches are an alternative way to obtain labeled data in a quicker and cheaper way. However, these labels often contain more errors which can deteriorate a classifier’s performance when trained on this data. We propose a noise layer that is added to a neural network architecture. This allows modeling the noise and train on a combination of clean and noisy data. We show that in a low-resource NER task we can improve performance by up to 35% by using additional, noisy data and handling the noise.
Anthology ID:
W18-3402
Volume:
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
Month:
July
Year:
2018
Address:
Melbourne
Editors:
Reza Haffari, Colin Cherry, George Foster, Shahram Khadivi, Bahar Salehi
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12–18
Language:
URL:
https://aclanthology.org/W18-3402/
DOI:
10.18653/v1/W18-3402
Bibkey:
Cite (ACL):
Michael A. Hedderich and Dietrich Klakow. 2018. Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data. In Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP, pages 12–18, Melbourne. Association for Computational Linguistics.
Cite (Informal):
Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data (Hedderich & Klakow, ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3402.pdf
Data
CoNLL 2003