RoR: Read-over-Read for Long Document Machine Reading Comprehension

Jing Zhao, Junwei Bao, Yifan Wang, Yongwei Zhou, Youzheng Wu, Xiaodong He, Bowen Zhou


Abstract
Transformer-based pre-trained models, such as BERT, have achieved remarkable results on machine reading comprehension. However, due to the constraint of encoding length (e.g., 512 WordPiece tokens), a long document is usually split into multiple chunks that are independently read. It results in the reading field being limited to individual chunks without information collaboration for long document machine reading comprehension. To address this problem, we propose RoR, a read-over-read method, which expands the reading field from chunk to document. Specifically, RoR includes a chunk reader and a document reader. The former first predicts a set of regional answers for each chunk, which are then compacted into a highly-condensed version of the original document, guaranteeing to be encoded once. The latter further predicts the global answers from this condensed document. Eventually, a voting strategy is utilized to aggregate and rerank the regional and global answers for final prediction. Extensive experiments on two benchmarks QuAC and TriviaQA demonstrate the effectiveness of RoR for long document reading. Notably, RoR ranks 1st place on the QuAC leaderboard (https://quac.ai/) at the time of submission (May 17th, 2021).
Anthology ID:
2021.findings-emnlp.160
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1862–1872
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.160
DOI:
10.18653/v1/2021.findings-emnlp.160
Bibkey:
Cite (ACL):
Jing Zhao, Junwei Bao, Yifan Wang, Yongwei Zhou, Youzheng Wu, Xiaodong He, and Bowen Zhou. 2021. RoR: Read-over-Read for Long Document Machine Reading Comprehension. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1862–1872, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
RoR: Read-over-Read for Long Document Machine Reading Comprehension (Zhao et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.160.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.160.mp4
Code
 jd-ai-research-nlp/ror
Data
CoQAQuACSQuADTriviaQA