YNU-HPCC at ROCLING 2022 Shared Task: A Transformer-based Model with Focal Loss and Regularization Dropout for Chinese Healthcare Named Entity Recognition

Xiang Luo, Jin Wang, Xuejie Zhang


Abstract
Named Entity Recognition (NER) is a fundamental task in information extraction that locates the mentions of named entities and classifies them in unstructured texts. Previous studies typically used hidden Markov model (HMM) and conditional random fields (CRF) for NER. To learn long-distance dependencies in text, recurrent neural networks, e.g., LSTM and GRU can extract the semantic features for each token with a sequential manner. Based on Transformers, this paper describes the contribution to ROCLING-2022 Share Task. This paper adopts a transformer-based model with focal Loss and regularization dropout. The focal loss is to overcome the uneven distribution of the label. The regularization dropout (r-drop) is to address the problem of vocabulary and descriptions that are too domain-specific. The ensemble learning is to improve the performance of the model. Comparative experiments were conducted on dev set to select the model with the best performance for submission. That is, BERT model with BiLSTM-CRF, focal loss and R-Drop has achieved the best F1-score of 0.7768 and rank the 4th place.
Anthology ID:
2022.rocling-1.42
Volume:
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
Month:
November
Year:
2022
Address:
Taipei, Taiwan
Editors:
Yung-Chun Chang, Yi-Chin Huang
Venue:
ROCLING
SIG:
Publisher:
The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Note:
Pages:
335–342
Language:
Chinese
URL:
https://aclanthology.org/2022.rocling-1.42
DOI:
Bibkey:
Cite (ACL):
Xiang Luo, Jin Wang, and Xuejie Zhang. 2022. YNU-HPCC at ROCLING 2022 Shared Task: A Transformer-based Model with Focal Loss and Regularization Dropout for Chinese Healthcare Named Entity Recognition. In Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), pages 335–342, Taipei, Taiwan. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
Cite (Informal):
YNU-HPCC at ROCLING 2022 Shared Task: A Transformer-based Model with Focal Loss and Regularization Dropout for Chinese Healthcare Named Entity Recognition (Luo et al., ROCLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.rocling-1.42.pdf
Code
 suntea233/ROCLING-2022