Abuzar Khan


pdf bib
GameQA: Gamified Mobile App Platform for Building Multiple-Domain Question-Answering Datasets
Njall Skarphedinsson | Breki Gudmundsson | Steinar Smari | Marta Kristin Larusdottir | Hafsteinn Einarsson | Abuzar Khan | Eric Nyberg | Hrafn Loftsson
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

The methods used to create many of the well-known Question-Answering (QA) datasets are hard to replicate for low-resource languages. A commonality amongst these methods is hiring annotators to source answers from the internet by querying a single answer source, such as Wikipedia. Applying these methods for low-resource languages can be problematic since there is no single large answer source for these languages. Consequently, this can result in a high ratio of unanswered questions, since the amount of information in any single source is limited. To address this problem, we developed a novel crowd-sourcing platform to gather multiple-domain QA data for low-resource languages. Our platform, which consists of a mobile app and a web API, gamifies the data collection process. We successfully released the app for Icelandic (a low-resource language with about 350,000 native speakers) to build a dataset which rivals large QA datasets for high-resource languages both in terms of size and ratio of answered questions. We have made the platform open source with instructions on how to localize and deploy it to gather data for other low-resource languages.