Identifying Chinese Opinion Expressions with Extremely-Noisy Crowdsourcing Annotations

Xin Zhang, Guangwei Xu, Yueheng Sun, Meishan Zhang, Xiaobin Wang, Min Zhang


Abstract
Recent works of opinion expression identification (OEI) rely heavily on the quality and scale of the manually-constructed training corpus, which could be extremely difficult to satisfy. Crowdsourcing is one practical solution for this problem, aiming to create a large-scale but quality-unguaranteed corpus. In this work, we investigate Chinese OEI with extremely-noisy crowdsourcing annotations, constructing a dataset at a very low cost. Following Zhang el al. (2021), we train the annotator-adapter model by regarding all annotations as gold-standard in terms of crowd annotators, and test the model by using a synthetic expert, which is a mixture of all annotators. As this annotator-mixture for testing is never modeled explicitly in the training phase, we propose to generate synthetic training samples by a pertinent mixup strategy to make the training and testing highly consistent. The simulation experiments on our constructed dataset show that crowdsourcing is highly promising for OEI, and our proposed annotator-mixup can further enhance the crowdsourcing modeling.
Anthology ID:
2022.acl-long.200
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2801–2813
Language:
URL:
https://aclanthology.org/2022.acl-long.200
DOI:
10.18653/v1/2022.acl-long.200
Bibkey:
Cite (ACL):
Xin Zhang, Guangwei Xu, Yueheng Sun, Meishan Zhang, Xiaobin Wang, and Min Zhang. 2022. Identifying Chinese Opinion Expressions with Extremely-Noisy Crowdsourcing Annotations. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2801–2813, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Identifying Chinese Opinion Expressions with Extremely-Noisy Crowdsourcing Annotations (Zhang et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.200.pdf
Software:
 2022.acl-long.200.software.zip
Code
 izhx/crowd-oei
Data
MPQA Opinion Corpus