Chinese Movie Dialogue Question Answering Dataset

Shang-Bao Luo; Cheng-Chung Fan; Kuan-Yu Chen; Yu Tsao; Hsin-Min Wang; Keh-Yih Su

Chinese Movie Dialogue Question Answering Dataset

Shang-Bao Luo, Cheng-Chung Fan, Kuan-Yu Chen, Yu Tsao, Hsin-Min Wang, Keh-Yih Su

Abstract

This paper constructs a Chinese dialogue-based information-seeking question answering dataset CMDQA, which is mainly applied to the scenario of getting Chinese movie related information. It contains 10K QA dialogs (40K turns in total). All questions and background documents are compiled from the Wikipedia via an Internet crawler. The answers to the questions are obtained via extracting the corresponding answer spans within the related text passage. In CMDQA, in addition to searching related documents, pronouns are also added to the question to better mimic the real dialog scenario. This dataset can test the individual performance of the information retrieval, the question answering and the question re-writing modules. This paper also provides a baseline system and shows its performance on this dataset. The experiments elucidate that it still has a big gap to catch the human performance. This dataset thus provides enough challenge for the researcher to conduct related research.

Anthology ID:: 2022.rocling-1.2
Volume:: Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
Month:: November
Year:: 2022
Address:: Taipei, Taiwan
Editors:: Yung-Chun Chang, Yi-Chin Huang
Venue:: ROCLING
SIG:
Publisher:: The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Note:
Pages:: 7–14
Language:: Chinese
URL:: https://aclanthology.org/2022.rocling-1.2/
DOI:
Bibkey:
Cite (ACL):: Shang-Bao Luo, Cheng-Chung Fan, Kuan-Yu Chen, Yu Tsao, Hsin-Min Wang, and Keh-Yih Su. 2022. Chinese Movie Dialogue Question Answering Dataset. In Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), pages 7–14, Taipei, Taiwan. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
Cite (Informal):: Chinese Movie Dialogue Question Answering Dataset (Luo et al., ROCLING 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.rocling-1.2.pdf

PDF Cite Search Fix data