A Bayesian Approach for Sequence Tagging with Crowds

Edwin Simpson, Iryna Gurevych


Abstract
Current methods for sequence tagging, a core task in NLP, are data hungry, which motivates the use of crowdsourcing as a cheap way to obtain labelled data. However, annotators are often unreliable and current aggregation methods cannot capture common types of span annotation error. To address this, we propose a Bayesian method for aggregating sequence tags that reduces errors by modelling sequential dependencies between the annotations as well as the ground-truth labels. By taking a Bayesian approach, we account for uncertainty in the model due to both annotator errors and the lack of data for modelling annotators who complete few tasks. We evaluate our model on crowdsourced data for named entity recognition, information extraction and argument mining, showing that our sequential model outperforms the previous state of the art, and that Bayesian approaches outperform non-Bayesian alternatives. We also find that our approach can reduce crowdsourcing costs through more effective active learning, as it better captures uncertainty in the sequence labels when there are few annotations.
Anthology ID:
D19-1101
Volume:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
Venues:
EMNLP | IJCNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1093–1104
Language:
URL:
https://aclanthology.org/D19-1101/
DOI:
10.18653/v1/D19-1101
Bibkey:
Cite (ACL):
Edwin Simpson and Iryna Gurevych. 2019. A Bayesian Approach for Sequence Tagging with Crowds. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1093–1104, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
A Bayesian Approach for Sequence Tagging with Crowds (Simpson & Gurevych, EMNLP-IJCNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-1101.pdf
Code
 UKPLab/arxiv2018-bayesian-ensembles
Data
CoNLL 2003