How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks

Divyansh Kaushik, Zachary C. Lipton


Abstract
Many recent papers address reading comprehension, where examples consist of (question, passage, answer) tuples. Presumably, a model must combine information from both questions and passages to predict corresponding answers. However, despite intense interest in the topic, with hundreds of published papers vying for leaderboard dominance, basic questions about the difficulty of many popular benchmarks remain unanswered. In this paper, we establish sensible baselines for the bAbI, SQuAD, CBT, CNN, and Who-did-What datasets, finding that question- and passage-only models often perform surprisingly well. On 14 out of 20 bAbI tasks, passage-only models achieve greater than 50% accuracy, sometimes matching the full model. Interestingly, while CBT provides 20-sentence passages, only the last is needed for accurate prediction. By comparison, SQuAD and CNN appear better-constructed.
Anthology ID:
D18-1546
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
5010–5015
Language:
URL:
https://aclanthology.org/D18-1546
DOI:
10.18653/v1/D18-1546
Bibkey:
Cite (ACL):
Divyansh Kaushik and Zachary C. Lipton. 2018. How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5010–5015, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks (Kaushik & Lipton, EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1546.pdf
Video:
 https://vimeo.com/306140720
Data
CBTSQuADWho-did-What