GameQA: Gamified Mobile App Platform for Building Multiple-Domain Question-Answering Datasets

Njall Skarphedinsson, Breki Gudmundsson, Steinar Smari, Marta Kristin Larusdottir, Hafsteinn Einarsson, Abuzar Khan, Eric Nyberg, Hrafn Loftsson


Abstract
The methods used to create many of the well-known Question-Answering (QA) datasets are hard to replicate for low-resource languages. A commonality amongst these methods is hiring annotators to source answers from the internet by querying a single answer source, such as Wikipedia. Applying these methods for low-resource languages can be problematic since there is no single large answer source for these languages. Consequently, this can result in a high ratio of unanswered questions, since the amount of information in any single source is limited. To address this problem, we developed a novel crowd-sourcing platform to gather multiple-domain QA data for low-resource languages. Our platform, which consists of a mobile app and a web API, gamifies the data collection process. We successfully released the app for Icelandic (a low-resource language with about 350,000 native speakers) to build a dataset which rivals large QA datasets for high-resource languages both in terms of size and ratio of answered questions. We have made the platform open source with instructions on how to localize and deploy it to gather data for other low-resource languages.
Anthology ID:
2023.eacl-demo.18
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Danilo Croce, Luca Soldaini
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
152–160
Language:
URL:
https://aclanthology.org/2023.eacl-demo.18
DOI:
10.18653/v1/2023.eacl-demo.18
Bibkey:
Cite (ACL):
Njall Skarphedinsson, Breki Gudmundsson, Steinar Smari, Marta Kristin Larusdottir, Hafsteinn Einarsson, Abuzar Khan, Eric Nyberg, and Hrafn Loftsson. 2023. GameQA: Gamified Mobile App Platform for Building Multiple-Domain Question-Answering Datasets. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 152–160, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
GameQA: Gamified Mobile App Platform for Building Multiple-Domain Question-Answering Datasets (Skarphedinsson et al., EACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eacl-demo.18.pdf
Video:
 https://aclanthology.org/2023.eacl-demo.18.mp4