Using Adversarial Attacks to Reveal the Statistical Bias in Machine Reading Comprehension Models

Jieyu Lin, Jiajie Zou, Nai Ding


Abstract
Pre-trained language models have achieved human-level performance on many Machine Reading Comprehension (MRC) tasks, but it remains unclear whether these models truly understand language or answer questions by exploiting statistical biases in datasets. Here, we demonstrate a simple yet effective method to attack MRC models and reveal the statistical biases in these models. We apply the method to the RACE dataset, for which the answer to each MRC question is selected from 4 options. It is found that several pre-trained language models, including BERT, ALBERT, and RoBERTa, show consistent preference to some options, even when these options are irrelevant to the question. When interfered by these irrelevant options, the performance of MRC models can be reduced from human-level performance to the chance-level performance. Human readers, however, are not clearly affected by these irrelevant options. Finally, we propose an augmented training method that can greatly reduce models’ statistical biases.
Anthology ID:
2021.acl-short.43
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:
August
Year:
2021
Address:
Online
Editors:
Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
333–342
Language:
URL:
https://aclanthology.org/2021.acl-short.43
DOI:
10.18653/v1/2021.acl-short.43
Bibkey:
Cite (ACL):
Jieyu Lin, Jiajie Zou, and Nai Ding. 2021. Using Adversarial Attacks to Reveal the Statistical Bias in Machine Reading Comprehension Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 333–342, Online. Association for Computational Linguistics.
Cite (Informal):
Using Adversarial Attacks to Reveal the Statistical Bias in Machine Reading Comprehension Models (Lin et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-short.43.pdf
Video:
 https://aclanthology.org/2021.acl-short.43.mp4
Data
RACE