Qin Chen


2023

pdf bib
基于推理链的多跳问答对抗攻击和对抗增强训练方法(Reasoning Chain Based Adversarial Attack and Adversarial Augmentation Training for Multi-hop Question Answering)
Jiayu Ding (佳玙丁,) | Siyuan Wang (王思远) | Zhongyu Wei (魏忠钰) | Qin Chen (陈琴) | Xuanjing Huang (黄萱菁)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“本文提出了一种基于多跳推理链的对抗攻击方法,通过向输入文本中加入对抗性的攻击文本,并测试问答模型在干扰数据下生成答案的准确性,以检测问答模型真正执行多跳推理的能力和可解释性。该方法首先从输入文本中抽取从问题实体到答案实体的推理链,并基于推理链的特征把多跳问题分为了不同的推理类型,提出了一个模型来自动化实现问题拆解和推理类型预测,然后根据推理类型对原问题进行修改来构造攻击干扰句。实验对多个多跳问答模型进行了对抗攻击测试,所有模型的性能都显著下降,验证了该攻击方法的有效性以及目前问答模型存在的不足;向原训练集中加入对抗样本进行增强训练后,模型性能均有所回升,证明了本对抗增强训练方法可以提升模型的鲁棒性。”

2022

pdf bib
ECNU_ICA at SemEval-2022 Task 10: A Simple and Unified Model for Monolingual and Crosslingual Structured Sentiment Analysis
Qi Zhang | Jie Zhou | Qin Chen | Qingchun Bai | Jun Xiao | Liang He
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Sentiment analysis is increasingly viewed as a vital task both from an academic and a commercial standpoint. In this paper, we focus on the structured sentiment analysis task that is released on SemEval-2022 Task 10. The task aims to extract the structured sentiment information (e.g., holder, target, expression and sentiment polarity) in a text. We propose a simple and unified model for both the monolingual and crosslingual structured sentiment analysis tasks. We translate this task into an event extraction task by regrading the expression as the trigger word and the other elements as the arguments of the event. Particularly, we first extract the expression by judging its start and end indices. Then, to consider the expression, we design a conditional layer normalization algorithm to extract the holder and target based on the extracted expression. Finally, we infer the sentiment polarity based on the extracted structured information. Pre-trained language models are utilized to obtain the text representation. We conduct the experiments on seven datasets in five languages. It attracted 233 submissions in monolingual subtask and crosslingual subtask from 32 teams. Finally, we obtain the top 5 place on crosslingual tasks.

pdf bib
A Multi-Format Transfer Learning Model for Event Argument Extraction via Variational Information Bottleneck
Jie Zhou | Qi Zhang | Qin Chen | Qi Zhang | Liang He | Xuanjing Huang
Proceedings of the 29th International Conference on Computational Linguistics

Event argument extraction (EAE) aims to extract arguments with given roles from texts, which have been widely studied in natural language processing. Most previous works have achieved good performance in specific EAE datasets with dedicated neural architectures. Whereas, these architectures are usually difficult to adapt to new datasets/scenarios with various annotation schemas or formats. Furthermore, they rely on large-scale labeled data for training, which is unavailable due to the high labelling cost in most cases. In this paper, we propose a multi-format transfer learning model with variational information bottleneck, which makes use of the information especially the common knowledge in existing datasets for EAE in new datasets. Specifically, we introduce a shared-specific prompt framework to learn both format-shared and format-specific knowledge from datasets with different formats. In order to further absorb the common knowledge for EAE and eliminate the irrelevant noise, we integrate variational information bottleneck into our architecture to refine the shared representation. We conduct extensive experiments on three benchmark datasets, and obtain new state-of-the-art performance on EAE.

2021

pdf bib
Attending via both Fine-tuning and Compressing
Jie Zhou | Yuanbin Wu | Qin Chen | Xuanjing Huang | Liang He
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
基于多质心异质图学习的社交网络用户建模(User Representation Learning based on Multi-centroid Heterogeneous Graph Neural Networks)
Shangyi Ning (宁上毅) | Guanying Li (李冠颖) | Qin Chen (陈琴) | Zengfeng Huang (黄增峰) | Baohua Zhou (周葆华) | Zhongyu Wei (魏忠钰)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

用户建模已经引起了学术界和工业界的广泛关注。现有的方法大多侧重于将用户间的人际关系融入社区,而用户生成的内容(如帖子)却没有得到很好的研究。在本文中,我们通过实际舆情传播相关的分析表明,在舆情传播过程中对用户属性进行研究的重要作用,并且提出了用户资料数据的筛选方法。同时,我们提出了一种通过异构多质心图池为用户捕获更多不同社区特征的建模。我们首先构造了一个由用户和关键字组成的异质图,并在其上采用了一个异质图神经网络。为了方便用户建模的图表示,提出了一种多质心图池化机制,将多质心的集群特征融入到表示学习中。在三个基准数据集上的大量实验表明了该方法的有效性。

pdf bib
ECNUICA at SemEval-2021 Task 11: Rule based Information Extraction Pipeline
Jiaju Lin | Jing Ling | Zhiwei Wang | Jiawei Liu | Qin Chen | Liang He
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper presents our endeavor for solving task11, NLPContributionGraph, of SemEval-2021. The purpose of the task was to extract triples from a paper in the Nature Language Processing field for constructing an Open Research Knowledge Graph. The task includes three sub-tasks: detecting the contribution sentences in papers, identifying scientific terms and predicate phrases from the contribution sentences; and inferring triples in the form of (subject, predicate, object) as statements for Knowledge Graph building. In this paper, we apply an ensemble of various fine-tuned pre-trained language models (PLM) for tasks one and two. In addition, self-training methods are adopted for tackling the shortage of annotated data. For the third task, rather than using classic neural open information extraction (OIE) architectures, we generate potential triples via manually designed rules and develop a binary classifier to differentiate positive ones from others. The quantitative results show that we obtain the 4th, 2nd, and 2nd rank in three evaluation phases.

2020

pdf bib
Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name Recognition
Yun He | Ziwei Zhu | Yin Zhang | Qin Chen | James Caverlee
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Knowledge of a disease includes information of various aspects of the disease, such as signs and symptoms, diagnosis and treatment. This disease knowledge is critical for many health-related and biomedical tasks, including consumer health question answering, medical language inference and disease name recognition. While pre-trained language models like BERT have shown success in capturing syntactic, semantic, and world knowledge from text, we find they can be further complemented by specific information like knowledge of symptoms, diagnoses, treatments, and other disease aspects. Hence, we integrate BERT with disease knowledge for improving these important tasks. Specifically, we propose a new disease knowledge infusion training procedure and evaluate it on a suite of BERT models including BERT, BioBERT, SciBERT, ClinicalBERT, BlueBERT, and ALBERT. Experiments over the three tasks show that these models can be enhanced in nearly all cases, demonstrating the viability of disease knowledge infusion. For example, accuracy of BioBERT on consumer health question answering is improved from 68.29% to 72.09%, while new SOTA results are observed in two datasets. We make our data and code freely available.

pdf bib
Automatic Term Name Generation for Gene Ontology: Task and Dataset
Yanjian Zhang | Qin Chen | Yiteng Zhang | Zhongyu Wei | Yixu Gao | Jiajie Peng | Zengfeng Huang | Weijian Sun | Xuanjing Huang
Findings of the Association for Computational Linguistics: EMNLP 2020

Terms contained in Gene Ontology (GO) have been widely used in biology and bio-medicine. Most previous research focuses on inferring new GO terms, while the term names that reflect the gene function are still named by the experts. To fill this gap, we propose a novel task, namely term name generation for GO, and build a large-scale benchmark dataset. Furthermore, we present a graph-based generative model that incorporates the relations between genes, words and terms for term name generation, which exhibits great advantages over the strong baselines.

2019

pdf bib
Enhancing Dialogue Symptom Diagnosis with Global Attention and Symptom Graph
Xinzhu Lin | Xiahui He | Qin Chen | Huaixiao Tou | Zhongyu Wei | Ting Chen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Symptom diagnosis is a challenging yet profound problem in natural language processing. Most previous research focus on investigating the standard electronic medical records for symptom diagnosis, while the dialogues between doctors and patients that contain more rich information are not well studied. In this paper, we first construct a dialogue symptom diagnosis dataset based on an online medical forum with a large amount of dialogues between patients and doctors. Then, we provide some benchmark models on this dataset to boost the research of dialogue symptom diagnosis. In order to further enhance the performance of symptom diagnosis over dialogues, we propose a global attention mechanism to capture more symptom related information, and build a symptom graph to model the associations between symptoms rather than treating each symptom independently. Experimental results show that both the global attention and symptom graph are effective to boost dialogue symptom diagnosis. In particular, our proposed model achieves the state-of-the-art performance on the constructed dataset.