What does BERT Learn from Arabic Machine Reading Comprehension Datasets?

Eman Albilali, Nora Altwairesh, Manar Hosny


Abstract
In machine reading comprehension tasks, a model must extract an answer from the available context given a question and a passage. Recently, transformer-based pre-trained language models have achieved state-of-the-art performance in several natural language processing tasks. However, it is unclear whether such performance reflects true language understanding. In this paper, we propose adversarial examples to probe an Arabic pre-trained language model (AraBERT), leading to a significant performance drop over four Arabic machine reading comprehension datasets. We present a layer-wise analysis for the transformer’s hidden states to offer insights into how AraBERT reasons to derive an answer. The experiments indicate that AraBERT relies on superficial cues and keyword matching rather than text understanding. Furthermore, hidden state visualization demonstrates that prediction errors can be recognized from vector representations in earlier layers.
Anthology ID:
2021.wanlp-1.4
Volume:
Proceedings of the Sixth Arabic Natural Language Processing Workshop
Month:
April
Year:
2021
Address:
Kyiv, Ukraine (Virtual)
Editors:
Nizar Habash, Houda Bouamor, Hazem Hajj, Walid Magdy, Wajdi Zaghouani, Fethi Bougares, Nadi Tomeh, Ibrahim Abu Farha, Samia Touileb
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
32–41
Language:
URL:
https://aclanthology.org/2021.wanlp-1.4
DOI:
Bibkey:
Cite (ACL):
Eman Albilali, Nora Altwairesh, and Manar Hosny. 2021. What does BERT Learn from Arabic Machine Reading Comprehension Datasets?. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 32–41, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
Cite (Informal):
What does BERT Learn from Arabic Machine Reading Comprehension Datasets? (Albilali et al., WANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wanlp-1.4.pdf
Data
MLQATyDiQAXQuAD