What If Sentence-hood is Hard to Define: A Case Study in Chinese Reading Comprehension

Jiawei Wang; Hai Zhao; Yinggong Zhao; Libin Shen

doi:10.18653/v1/2021.findings-emnlp.202

What If Sentence-hood is Hard to Define: A Case Study in Chinese Reading Comprehension

Jiawei Wang, Hai Zhao, Yinggong Zhao, Libin Shen

Abstract

Machine reading comprehension (MRC) is a challenging NLP task for it requires to carefully deal with all linguistic granularities from word, sentence to passage. For extractive MRC, the answer span has been shown mostly determined by key evidence linguistic units, in which it is a sentence in most cases. However, we recently discovered that sentences may not be clearly defined in many languages to different extents, so that this causes so-called location unit ambiguity problem and as a result makes it difficult for the model to determine which sentence exactly contains the answer span when sentence itself has not been clearly defined at all. Taking Chinese language as a case study, we explain and analyze such a linguistic phenomenon and correspondingly propose a reader with Explicit Span-Sentence Predication to alleviate such a problem. Our proposed reader eventually helps achieve a new state-of-the-art on Chinese MRC benchmark and shows great potential in dealing with other languages.

Anthology ID:: 2021.findings-emnlp.202
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2021
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: Findings
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2348–2359
Language:
URL:: https://aclanthology.org/2021.findings-emnlp.202
DOI:: 10.18653/v1/2021.findings-emnlp.202
Bibkey:
Cite (ACL):: Jiawei Wang, Hai Zhao, Yinggong Zhao, and Libin Shen. 2021. What If Sentence-hood is Hard to Define: A Case Study in Chinese Reading Comprehension. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2348–2359, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: What If Sentence-hood is Hard to Define: A Case Study in Chinese Reading Comprehension (Wang et al., Findings 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.findings-emnlp.202.pdf
Software:: 2021.findings-emnlp.202.Software.zip
Video:: https://aclanthology.org/2021.findings-emnlp.202.mp4
Data: CJRC, CMRC, CMRC 2018, DRCD, SQuAD

PDF Cite Search Software Video