A Chinese Machine Reading Comprehension Dataset Automatic Generated Based on Knowledge Graph

Zhao Hanyu; Yuan Sha; Leng Jiahong; Pan Xiang; Xue Zhao; Ma Quanyue; Liang Yangxiao

A Chinese Machine Reading Comprehension Dataset Automatic Generated Based on Knowledge Graph

Zhao Hanyu, Yuan Sha, Leng Jiahong, Pan Xiang, Xue Zhao, Ma Quanyue, Liang Yangxiao

Abstract

Machine reading comprehension (MRC) is a typical natural language processing (NLP)task and has developed rapidly in the last few years. Various reading comprehension datasets have been built to support MRC studies. However large-scale and high-quality datasets are rare due to the high complexity and huge workforce cost of making sucha dataset. Besides most reading comprehension datasets are in English and Chinesedatasets are insufficient. In this paper we propose an automatic method for MRCdataset generation and build the largest Chinese medical reading comprehension dataset presently named CMedRC. Our dataset contains 17k questions generated by our auto-matic method and some seed questions. We obtain the corresponding answers from amedical knowledge graph and manually check all of them. Finally we test BiLSTM andBERT-based pre-trained language models (PLMs) on our dataset and propose a base-line for the following studies. Results show that the automatic MRC dataset generation method is considerable for future model improvements.

Anthology ID:: 2021.ccl-1.95
Volume:: Proceedings of the 20th Chinese National Conference on Computational Linguistics
Month:: August
Year:: 2021
Address:: Huhhot, China
Editors:: Sheng Li (李生), Maosong Sun (孙茂松), Yang Liu (刘洋), Hua Wu (吴华), Kang Liu (刘康), Wanxiang Che (车万翔), Shizhu He (何世柱), Gaoqi Rao (饶高琦)
Venue:: CCL
SIG:
Publisher:: Chinese Information Processing Society of China
Note:
Pages:: 1066–1075
Language:: English
URL:: https://aclanthology.org/2021.ccl-1.95/
DOI:
Bibkey:
Cite (ACL):: Zhao Hanyu, Yuan Sha, Leng Jiahong, Pan Xiang, Xue Zhao, Ma Quanyue, and Liang Yangxiao. 2021. A Chinese Machine Reading Comprehension Dataset Automatic Generated Based on Knowledge Graph. In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 1066–1075, Huhhot, China. Chinese Information Processing Society of China.
Cite (Informal):: A Chinese Machine Reading Comprehension Dataset Automatic Generated Based on Knowledge Graph (Hanyu et al., CCL 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.ccl-1.95.pdf

PDF Cite Search Fix data