MLEC-QA: A Chinese Multi-Choice Biomedical Question Answering Dataset

Jing Li, Shangping Zhong, Kaizhi Chen


Abstract
Question Answering (QA) has been successfully applied in scenarios of human-computer interaction such as chatbots and search engines. However, for the specific biomedical domain, QA systems are still immature due to expert-annotated datasets being limited by category and scale. In this paper, we present MLEC-QA, the largest-scale Chinese multi-choice biomedical QA dataset, collected from the National Medical Licensing Examination in China. The dataset is composed of five subsets with 136,236 biomedical multi-choice questions with extra materials (images or tables) annotated by human experts, and first covers the following biomedical sub-fields: Clinic, Stomatology, Public Health, Traditional Chinese Medicine, and Traditional Chinese Medicine Combined with Western Medicine. We implement eight representative control methods and open-domain QA methods as baselines. Experimental results demonstrate that even the current best model can only achieve accuracies between 40% to 55% on five subsets, especially performing poorly on questions that require sophisticated reasoning ability. We hope the release of the MLEC-QA dataset can serve as a valuable resource for research and evaluation in open-domain QA, and also make advances for biomedical QA systems.
Anthology ID:
2021.emnlp-main.698
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8862–8874
Language:
URL:
https://aclanthology.org/2021.emnlp-main.698
DOI:
10.18653/v1/2021.emnlp-main.698
Bibkey:
Cite (ACL):
Jing Li, Shangping Zhong, and Kaizhi Chen. 2021. MLEC-QA: A Chinese Multi-Choice Biomedical Question Answering Dataset. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8862–8874, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
MLEC-QA: A Chinese Multi-Choice Biomedical Question Answering Dataset (Li et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.698.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.698.mp4
Code
 judenpech/mlec-qa
Data
HeadQAMedQA