The Language Demographics of Amazon Mechanical Turk

Ellie Pavlick; Matt Post; Ann Irvine; Dmitry Kachaev; Chris Callison-Burch

doi:10.1162/tacl_a_00167

The Language Demographics of Amazon Mechanical Turk

Ellie Pavlick, Matt Post, Ann Irvine, Dmitry Kachaev, Chris Callison-Burch

Abstract

We present a large scale study of the languages spoken by bilingual workers on Mechanical Turk (MTurk). We establish a methodology for determining the language skills of anonymous crowd workers that is more robust than simple surveying. We validate workers’ self-reported language skill claims by measuring their ability to correctly translate words, and by geolocating workers to see if they reside in countries where the languages are likely to be spoken. Rather than posting a one-off survey, we posted paid tasks consisting of 1,000 assignments to translate a total of 10,000 words in each of 100 languages. Our study ran for several months, and was highly visible on the MTurk crowdsourcing platform, increasing the chances that bilingual workers would complete it. Our study was useful both to create bilingual dictionaries and to act as census of the bilingual speakers on MTurk. We use this data to recommend languages with the largest speaker populations as good candidates for other researchers who want to develop crowdsourced, multilingual technologies. To further demonstrate the value of creating data via crowdsourcing, we hire workers to create bilingual parallel corpora in six Indian languages, and use them to train statistical machine translation systems.

Anthology ID:: Q14-1007
Volume:: Transactions of the Association for Computational Linguistics, Volume 2
Month:
Year:: 2014
Address:: Cambridge, MA
Editors:: Dekang Lin, Michael Collins, Lillian Lee
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 79–92
Language:
URL:: https://aclanthology.org/Q14-1007/
DOI:: 10.1162/tacl_a_00167
Bibkey:
Cite (ACL):: Ellie Pavlick, Matt Post, Ann Irvine, Dmitry Kachaev, and Chris Callison-Burch. 2014. The Language Demographics of Amazon Mechanical Turk. Transactions of the Association for Computational Linguistics, 2:79–92.
Cite (Informal):: The Language Demographics of Amazon Mechanical Turk (Pavlick et al., TACL 2014)
Copy Citation:
PDF:: https://aclanthology.org/Q14-1007.pdf
Video:: https://aclanthology.org/Q14-1007.mp4

PDF Cite Search Video Fix data