NAT: Noise-Aware Training for Robust Neural Sequence Labeling

Marcin Namysl, Sven Behnke, Joachim Köhler


Abstract
Sequence labeling systems should perform reliably not only under ideal conditions but also with corrupted inputs—as these systems often process user-generated text or follow an error-prone upstream component. To this end, we formulate the noisy sequence labeling problem, where the input may undergo an unknown noising process and propose two Noise-Aware Training (NAT) objectives that improve robustness of sequence labeling performed on perturbed input: Our data augmentation method trains a neural model using a mixture of clean and noisy samples, whereas our stability training algorithm encourages the model to create a noise-invariant latent representation. We employ a vanilla noise model at training time. For evaluation, we use both the original data and its variants perturbed with real OCR errors and misspellings. Extensive experiments on English and German named entity recognition benchmarks confirmed that NAT consistently improved robustness of popular sequence labeling models, preserving accuracy on the original input. We make our code and data publicly available for the research community.
Anthology ID:
2020.acl-main.138
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1501–1517
Language:
URL:
https://aclanthology.org/2020.acl-main.138
DOI:
10.18653/v1/2020.acl-main.138
Bibkey:
Cite (ACL):
Marcin Namysl, Sven Behnke, and Joachim Köhler. 2020. NAT: Noise-Aware Training for Robust Neural Sequence Labeling. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1501–1517, Online. Association for Computational Linguistics.
Cite (Informal):
NAT: Noise-Aware Training for Robust Neural Sequence Labeling (Namysl et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.138.pdf
Software:
 2020.acl-main.138.Software.zip
Video:
 http://slideslive.com/38928783
Code
 mnamysl/nat-acl2020
Data
CoNLL 2003