Benchmarking Machine Reading Comprehension: A Psychological Perspective

Saku Sugawara; Pontus Stenetorp; Akiko Aizawa

doi:10.18653/v1/2021.eacl-main.137

Benchmarking Machine Reading Comprehension: A Psychological Perspective

Saku Sugawara, Pontus Stenetorp, Akiko Aizawa

Abstract

Machine reading comprehension (MRC) has received considerable attention as a benchmark for natural language understanding. However, the conventional task design of MRC lacks explainability beyond the model interpretation, i.e., reading comprehension by a model cannot be explained in human terms. To this end, this position paper provides a theoretical basis for the design of MRC datasets based on psychology as well as psychometrics, and summarizes it in terms of the prerequisites for benchmarking MRC. We conclude that future datasets should (i) evaluate the capability of the model for constructing a coherent and grounded representation to understand context-dependent situations and (ii) ensure substantive validity by shortcut-proof questions and explanation as a part of the task design.

Anthology ID:: 2021.eacl-main.137
Volume:: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Month:: April
Year:: 2021
Address:: Online
Editors:: Paola Merlo, Jorg Tiedemann, Reut Tsarfaty
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1592–1612
Language:
URL:: https://aclanthology.org/2021.eacl-main.137/
DOI:: 10.18653/v1/2021.eacl-main.137
Bibkey:
Cite (ACL):: Saku Sugawara, Pontus Stenetorp, and Akiko Aizawa. 2021. Benchmarking Machine Reading Comprehension: A Psychological Perspective. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1592–1612, Online. Association for Computational Linguistics.
Cite (Informal):: Benchmarking Machine Reading Comprehension: A Psychological Perspective (Sugawara et al., EACL 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.eacl-main.137.pdf

PDF Cite Search Fix data