Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque

Arantxa Otegi; Aitor Agirre; Jon Ander Campos; Aitor Soroa; Eneko Agirre

Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque

Arantxa Otegi, Aitor Agirre, Jon Ander Campos, Aitor Soroa, Eneko Agirre

Abstract

Conversational Question Answering (CQA) systems meet user information needs by having conversations with them, where answers to the questions are retrieved from text. There exist a variety of datasets for English, with tens of thousands of training examples, and pre-trained language models have allowed to obtain impressive results. The goal of our research is to test the performance of CQA systems under low-resource conditions which are common for most non-English languages: small amounts of native annotations and other limitations linked to low resource languages, like lack of crowdworkers or smaller wikipedias. We focus on the Basque language, and present the first non-English CQA dataset and results. Our experiments show that it is possible to obtain good results with low amounts of native data thanks to cross-lingual transfer, with quality comparable to those obtained for English. We also discovered that dialogue history models are not directly transferable to another language, calling for further research. The dataset is publicly available.

Anthology ID:: 2020.lrec-1.55
Volume:: Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 436–442
Language:: English
URL:: https://aclanthology.org/2020.lrec-1.55/
DOI:
Bibkey:
Cite (ACL):: Arantxa Otegi, Aitor Agirre, Jon Ander Campos, Aitor Soroa, and Eneko Agirre. 2020. Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 436–442, Marseille, France. European Language Resources Association.
Cite (Informal):: Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque (Otegi et al., LREC 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.lrec-1.55.pdf

PDF Cite Search Fix data