Sequence Effects in Crowdsourced Annotations

Nitika Mathur, Timothy Baldwin, Trevor Cohn


Abstract
Manual data annotation is a vital component of NLP research. When designing annotation tasks, properties of the annotation interface can unintentionally lead to artefacts in the resulting dataset, biasing the evaluation. In this paper, we explore sequence effects where annotations of an item are affected by the preceding items. Having assigned one label to an instance, the annotator may be less (or more) likely to assign the same label to the next. During rating tasks, seeing a low quality item may affect the score given to the next item either positively or negatively. We see clear evidence of both types of effects using auto-correlation studies over three different crowdsourced datasets. We then recommend a simple way to minimise sequence effects.
Anthology ID:
D17-1306
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2860–2865
Language:
URL:
https://aclanthology.org/D17-1306
DOI:
10.18653/v1/D17-1306
Bibkey:
Cite (ACL):
Nitika Mathur, Timothy Baldwin, and Trevor Cohn. 2017. Sequence Effects in Crowdsourced Annotations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2860–2865, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Sequence Effects in Crowdsourced Annotations (Mathur et al., EMNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/D17-1306.pdf
Video:
 https://vimeo.com/238235659