Zan Hongying


2023

pdf bib
Learnable Conjunction Enhanced Model for Chinese Sentiment Analysis
Zhao Bingfei | Zan Hongying | Wang Jiajia | Han Yingjie
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“Sentiment analysis is a crucial text classification task that aims to extract, process, and analyzeopinions, sentiments, and subjectivity within texts. In current research on Chinese text, sentenceand aspect-based sentiment analysis is mainly tackled through well-designed models. However,despite the importance of word order and function words as essential means of semantic ex-pression in Chinese, they are often underutilized. This paper presents a new Chinese sentimentanalysis method that utilizes a Learnable Conjunctions Enhanced Model (LCEM). The LCEMadjusts the general structure of the pre-trained language model and incorporates conjunctionslocation information into the model’s fine-tuning process. Additionally, we discuss a variantstructure of residual connections to construct a residual structure that can learn critical informa-tion in the text and optimize it during training. We perform experiments on the public datasetsand demonstrate that our approach enhances performance on both sentence and aspect-basedsentiment analysis datasets compared to the baseline pre-trained language models. These resultsconfirm the effectiveness of our proposed method. Introduction”

2022

pdf bib
MRC-based Medical NER with Multi-task Learning and Multi-strategies
Xiaojing Du | Jia Yuxiang | Zan Hongying
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“Medical named entity recognition (NER), a fundamental task of medical information extraction, is crucial for medical knowledge graph construction, medical question answering, and automatic medical record analysis, etc. Compared with named entities (NEs) in general domain, medical named entities are usually more complex and prone to be nested. To cope with both flat NEs and nested NEs, we propose a MRC-based approach with multi-task learning and multi-strategies. NER can be treated as a sequence labeling (SL) task or a span boundary detection (SBD) task. We integrate MRC-CRF model for SL and MRC-Biaffine model for SBD into the multi-task learning architecture, and select the more efficient MRC-CRF as the final decoder. To further improve the model, we employ multi-strategies, including adaptive pre-training, adversarial training, and model stacking with cross validation. Experiments on both nested NER corpus CMeEE and flat NER corpus CCKS2019 show the effectiveness of the MRC-based model with multi-task learning and multi-strategies.”

2020

pdf bib
Reusable Phrase Extraction Based on Syntactic Parsing
Xuemin Duan | Zan Hongying | Xiaojing Bai | Christoph Zähner
Proceedings of the 19th Chinese National Conference on Computational Linguistics

Academic Phrasebank is an important resource for academic writers. Student writers use the phrases of Academic Phrasebank organizing their research article to improve their writing ability. Due to the limited size of Academic Phrasebank, it can not meet all the academic writing needs. There are still a large number of academic phraseology in the authentic research article. In this paper, we proposed an academic phraseology extraction model based on constituency parsing and dependency parsing, which can automatically extract the academic phraseology similar to phrases of Academic Phrasebank from an unlabelled research article. We divided the proposed model into three main components including an academic phraseology corpus module, a sentence simplification module, and a syntactic parsing module. We created a corpus of academic phraseology of 2,129 words to help judge whether a word is neutral and general, and created two datasets under two scenarios to verify the feasibility of the proposed model.