RoBERTa-based Traditional Chinese Medicine Named Entity Recognition Model
Ming-Hsiang Su | Chin-Wei Lee | Chi-Lun Hsu | Ruei-Cyuan Su
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
In this study, a named entity recognition was constructed and applied to the identification of Chinese medicine names and disease names. The results can be further used in a human-machine dialogue system to provide people with correct Chinese medicine medication reminders. First, this study uses web crawlers to sort out web resources into a Chinese medicine named entity corpus, collecting 1097 articles, 1412 disease names and 38714 Chinese medicine names. Then, we annotated each article using TCM name and BIO tagging method. Finally, this study trains and evaluates BERT, ALBERT, RoBERTa, GPT2 with BiLSTM and CRF. The experimental results show that RoBERTa’s NER system combining BiLSTM and CRF achieves the best system performance, with a precision rate of 0.96, a recall rate of 0.96, and an F1-score of 0.96.