Yafei Shi


2021

pdf bib
Classification, Extraction, and Normalization : CASIA_Unisound Team at the Social Media Mining for Health 2021 Shared Tasks
Tong Zhou | Zhucong Li | Zhen Gan | Baoli Zhang | Yubo Chen | Kun Niu | Jing Wan | Kang Liu | Jun Zhao | Yafei Shi | Weifeng Chong | Shengping Liu
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

This is the system description of the CASIA_Unisound team for Task 1, Task 7b, and Task 8 of the sixth Social Media Mining for Health Applications (SMM4H) shared task in 2021. Targeting on deal with two shared challenges, the colloquial text and the imbalance annotation, among those tasks, we apply a customized pre-trained language model and propose various training strategies. Experimental results show the effectiveness of our system. Moreover, we got an F1-score of 0.87 in task 8, which is the highest among all participates.

pdf bib
Biomedical Concept Normalization by Leveraging Hypernyms
Cheng Yan | Yuanzhe Zhang | Kang Liu | Jun Zhao | Yafei Shi | Shengping Liu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Biomedical Concept Normalization (BCN) is widely used in biomedical text processing as a fundamental module. Owing to numerous surface variants of biomedical concepts, BCN still remains challenging and unsolved. In this paper, we exploit biomedical concept hypernyms to facilitate BCN. We propose Biomedical Concept Normalizer with Hypernyms (BCNH), a novel framework that adopts list-wise training to make use of both hypernyms and synonyms, and also employs norm constraint on the representation of hypernym-hyponym entity pairs. The experimental results show that BCNH outperforms the previous state-of-the-art model on the NCBI dataset.

pdf bib
CroAno : A Crowd Annotation Platform for Improving Label Consistency of Chinese NER Dataset
Baoli Zhang | Zhucong Li | Zhen Gan | Yubo Chen | Jing Wan | Kang Liu | Jun Zhao | Shengping Liu | Yafei Shi
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

In this paper, we introduce CroAno, a web-based crowd annotation platform for the Chinese named entity recognition (NER). Besides some basic features for crowd annotation like fast tagging and data management, CroAno provides a systematic solution for improving label consistency of Chinese NER dataset. 1) Disagreement Adjudicator: CroAno uses a multi-dimensional highlight mode to visualize instance-level inconsistent entities and makes the revision process user-friendly. 2) Inconsistency Detector: CroAno employs a detector to locate corpus-level label inconsistency and provides users an interface to correct inconsistent entities in batches. 3) Prediction Error Analyzer: We deconstruct the entity prediction error of the model to six fine-grained entity error types. Users can employ this error system to detect corpus-level inconsistency from a model perspective. To validate the effectiveness of our platform, we use CroAno to revise two public datasets. In the two revised datasets, we get an improvement of +1.96% and +2.57% F1 respectively in model performance.