Cross-lingual and Cross-domain Evaluation of Machine Reading Comprehension with Squad and CALOR-Quest Corpora

Delphine Charlet, Geraldine Damnati, Frederic Bechet, Gabriel Marzinotto, Johannes Heinecke


Abstract
Machine Reading received recently a lot of attention thanks to both the availability of very large corpora such as SQuAD or MS MARCO containing triplets (document, question, answer), and the introduction of Transformer Language Models such as BERT which obtain excellent results, even matching human performance according to the SQuAD leaderboard. One of the key features of Transformer Models is their ability to be jointly trained across multiple languages, using a shared subword vocabulary, leading to the construction of cross-lingual lexical representations. This feature has been used recently to perform zero-shot cross-lingual experiments where a multilingual BERT model fine-tuned on a machine reading comprehension task exclusively for English was directly applied to Chinese and French documents with interesting performance. In this paper we study the cross-language and cross-domain capabilities of BERT on a Machine Reading Comprehension task on two corpora: SQuAD and a new French Machine Reading dataset, called CALOR-QUEST. The semantic annotation available on CALOR-QUEST allows us to give a detailed analysis on the kinds of questions that are properly handled through the cross-language process. We will try to answer this question: which factor between language mismatch and domain mismatch has the strongest influence on the performances of a Machine Reading Comprehension task?
Anthology ID:
2020.lrec-1.674
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5491–5497
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.674
DOI:
Bibkey:
Cite (ACL):
Delphine Charlet, Geraldine Damnati, Frederic Bechet, Gabriel Marzinotto, and Johannes Heinecke. 2020. Cross-lingual and Cross-domain Evaluation of Machine Reading Comprehension with Squad and CALOR-Quest Corpora. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5491–5497, Marseille, France. European Language Resources Association.
Cite (Informal):
Cross-lingual and Cross-domain Evaluation of Machine Reading Comprehension with Squad and CALOR-Quest Corpora (Charlet et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.674.pdf