Liang He - ACL Anthology

Liang He

2025

Optimizing Question Semantic Space for Dynamic Retrieval-Augmented Multi-hop Question Answering
Linhao Ye | Lang Yu | Zhikai Lei | Qin Chen | Jie Zhou | Liang He
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Retrieval-augmented generation (RAG) is usually integrated into large language models (LLMs) to mitigate hallucinations and knowledge obsolescence. Whereas, conventional one-step retrieve-and-read methods are insufficient for multi-hop question answering, facing challenges of retrieval semantic mismatching and the high cost in handling interdependent subquestions. In this paper, we propose Optimizing Question Semantic Space for Dynamic Retrieval-Augmented Multi-hop Question Answering (Q-DREAM). Q-DREAM consists of three key modules: (1) the Question Decomposition Module (QDM), which decomposes multi-hop questions into fine-grained subquestions; (2) the Subquestion Dependency Optimizer Module (SDOM), which models the interdependent relations of subquestions for better understanding; and (3) the Dynamic Passage Retrieval Module (DPRM), which aligns subquestions with relevant passages by optimizing the semantic embeddings.Experimental results across various benchmarks demonstrate that Q-DREAM significantly outperforms existing RAG methods, achieving state-of-the-art performance in both in-domain and out-of-domain settings. Notably, Q-DREAM also improves retrieval efficiency while maintaining high accuracy compared with recent baselines.

ACE-M³: Automatic Capability Evaluator for Multimodal Medical Models
Xiechi Zhang | Shunfan Zheng | Linlin Wang | Gerard de Melo | Zhu Cao | Xiaoling Wang | Liang He
Proceedings of the 31st International Conference on Computational Linguistics

As multimodal large language models (MLLMs) gain prominence in the medical field, the need for precise evaluation methods to assess their effectiveness has become critical. While benchmarks provide a reliable means to evaluate the capabilities of MLLMs, traditional metrics like ROUGE and BLEU employed for open domain evaluation only focus on token overlap and may not align with human judgment. While human evaluation is more reliable, it is labor-intensive, costly, and not scalable. LLM-based evaluation methods have proven promising, but to date, there is still an urgent need for open-source multimodal LLM-based evaluators in the medical field. To address this issue, we introduce ACE-M³, an open-sourced Automatic Capability Evaluator for Multimodal Medical Models that specifically designed to assess the question answering abilities of medical MLLMs. It first utilizes a branch-merge architecture to provide both detailed analysis and a concise final score based on standard medical evaluation criteria. Subsequently, a reward token-based direct preference optimization (RTDPO) strategy is incorporated to save training time without compromising performance of our model. Extensive experiments have demonstrated the effectiveness of our ACE-M³ model in evaluating the capabilities of medical MLLMs.

AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation
Xiechi Zhang | Zetian Ouyang | Linlin Wang | Gerard De Melo | Zhu Cao | Xiaoling Wang | Ya Zhang | Yanfeng Wang | Liang He
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

With the proliferation of large language models (LLMs) in the medical domain, there is increasing demand for improved evaluation techniques to assess their capabilities. However, traditional metrics like F1 and ROUGE, which rely on token overlaps to measure quality, significantly overlook the importance of medical terminology. While human evaluation tends to be more reliable, it can be very costly and may as well suffer from inaccuracies due to limits in human expertise and motivation. Although there are some evaluation methods based on LLMs, their usability in the medical field is limited due to their proprietary nature or lack of expertise. To tackle these challenges, we present AutoMedEval, an open-sourced automatic evaluation model with 13B parameters specifically engineered to measure the question-answering proficiency of medical LLMs. The overarching objective of AutoMedEval is to assess the quality of responses produced by diverse models, aspiring to significantly reduce the dependence on human evaluation. Specifically, we propose a hierarchical training method involving curriculum instruction tuning and an iterative knowledge introspection mechanism, enabling AutoMedEval to acquire professional medical assessment capabilities with limited instructional data. Human evaluations indicate that AutoMedEval surpasses other baselines in terms of correlation with human judgments.

P-React: Synthesizing Topic-Adaptive Reactions of Personality Traits via Mixture of Specialized LoRA Experts
Yuhao Dan | Jie Zhou | Qin Chen | Junfeng Tian | Liang He
Findings of the Association for Computational Linguistics: ACL 2025

Personalized large language models (LLMs) have attracted great attention in many applications, such as emotional support and role-playing. However, existing works primarily focus on modeling explicit character profiles, while ignoring the underlying personality traits that truly shape behaviors and decision-making, hampering the development of more anthropomorphic and psychologically-grounded AI systems. In this paper, we explore the modeling of Big Five personality traits, which is the most widely used trait theory in psychology, and propose P-React, a mixture of experts (MoE)-based personalized LLM. Particularly, we integrate a Personality Specialization Loss (PSL) to better capture individual trait expressions, providing a more nuanced and psychologically grounded personality simulacrum. To facilitate research in this field, we curate OCEAN-Chat, a high-quality, human-verified dataset designed to train LLMs in expressing personality traits across diverse topics. Extensive experiments demonstrate the effectiveness of P-React in maintaining consistent and real personality.

Enhancing Medical Dialogue Generation through Knowledge Refinement and Dynamic Prompt Adjustment
Hongda Sun | Jiaren Peng | Wenzhong Yang | Liang He | Bo Du | Rui Yan
Findings of the Association for Computational Linguistics: ACL 2025

Medical dialogue systems (MDS) have emerged as crucial online platforms for enabling multi-turn, context-aware conversations with patients. However, existing MDS often struggle to (1) identify relevant medical knowledge and (2) generate personalized, medically accurate responses. To address these challenges, we propose MedRef, a novel MDS that incorporates knowledge refining and dynamic prompt adjustment. First, we employ a knowledge refining mechanism to filter out irrelevant medical data, improving predictions of critical medical entities in responses. Additionally, we design a comprehensive prompt structure that incorporates historical details and evident details. To enable real-time adaptability to diverse patient conditions, we implement two key modules, Triplet Filter and Demo Selector, providing appropriate knowledge and demonstrations equipped in the system prompt.Extensive experiments on MedDG and KaMed benchmarks show that MedRef outperforms state-of-the-art baselines in both generation quality and medical entity accuracy, underscoring its effectiveness and reliability for real-world healthcare applications.

RGR-KBQA: Generating Logical Forms for Question Answering Using Knowledge-Graph-Enhanced Large Language Model
Tengfei Feng | Liang He
Proceedings of the 31st International Conference on Computational Linguistics

In the field of natural language processing, Knowledge Base Question Answering (KBQA) is a challenging task that involves accurately retrieving answers from structured knowledge. Existing methods often face issues when generating query statements using LLMs, as the knowledge introduced may be imprecise and the models themselves may exhibit hallucination problems, leading to low accuracy, particularly when dealing with complex questions. To address these challenges, we introduce a novel semantic parsing approach called RGR-KBQA, which adopts a Retrieve-Generate-Retrieve framework. The first retrieval step introduces factual knowledge from a knowledge graph to enhance the semantic understanding capabilities of LLMs, thereby improving generation accuracy of logical form. The second step uses a fine-tuned model to generate the logical form, and the final step involves unsupervised relation and entity retrieval to further enhance generation accuracy. These two retrieval steps help alleviate the hallucination problems inherent in LLMs. Experimental results show that RGR-KBQA demonstrate promising performance on CWQ and WebQSP datasets.

2024

CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios
Zetian Ouyang | Yishuai Qiu | Linlin Wang | Gerard De Melo | Ya Zhang | Yanfeng Wang | Liang He
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

With the proliferation of Large Language Models (LLMs) in diverse domains, there is a particular need for unified evaluation standards in clinical medical scenarios, where models need to be examined very thoroughly. We present CliMedBench, a comprehensive benchmark with 14 expert-guided core clinical scenarios specifically designed to assess the medical ability of LLMs across 7 pivot dimensions. It comprises 33,735 questions derived from real-world medical reports of top-tier tertiary hospitals and authentic examination exercises. The reliability of this benchmark has been confirmed in several ways. Subsequent experiments with existing LLMs have led to the following findings: (i) Chinese medical LLMs underperform on this benchmark, especially where medical reasoning and factual consistency are vital, underscoring the need for advances in clinical knowledge and diagnostic accuracy. (ii) Several general-domain LLMs demonstrate substantial potential in medical clinics, while the limited input capacity of many medical LLMs hinders their practical use. These findings reveal both the strengths and limitations of LLMs in clinical scenarios and offer critical insights for medical research.

How Do Humans Write Code? Large Models Do It the Same Way Too
Long Li | Xuzheng He | Haozhe Wang | Linlin Wang | Liang He
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Program-of-Thought (PoT) replaces natural language-based Chain-of-Thought (CoT) as the most popular method in Large Language Models (LLMs) mathematical reasoning tasks by utilizing external tool calls to circumvent computational errors. However, our evaluation of the GPT-4 and Llama series reveals that using PoT introduces more reasoning errors, such as incorrect formulas or flawed logic, compared to CoT. To address this issue, we propose Human-Think Language (HTL), which leverages a suite of strategies that help integrate PoT and CoT, encompassing: (1) a new generation paradigm that uses full CoT reasoning to control code generation. (2) Focus Attention, that directs model attention to the CoT reasoning during PoT to generate more logical code. (3) reinforcement learning that utilizes the accuracy of both CoT and PoT responses as rewards to prevent repetitive reasoning steps in LLMs when solving difficult math problems. Our method achieves an average improvement of 6.5% on the Llama-Base model and 4.3% on the Mistral-Base model across 8 mathematical calculation datasets. It also shows significant effectiveness on five out-of-domain datasets by controlling the model’s information flow, exhibiting strong transferability. Additionally, HTL shows the most significant improvement in non-mathematical natural language inference task, contributing to a unified reasoning task framework.

Learning Intrinsic Dimension via Information Bottleneck for Explainable Aspect-based Sentiment Analysis
Zhenxiao Cheng | Jie Zhou | Wen Wu | Qin Chen | Liang He
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Gradient-based explanation methods are increasingly used to interpret neural models in natural language processing (NLP) due to their high fidelity. Such methods determine word-level importance using dimension-level gradient values through a norm function, often presuming equal significance for all gradient dimensions. However, in the context of Aspect-based Sentiment Analysis (ABSA), our preliminary research suggests that only specific dimensions are pertinent. To address this, we propose the Information Bottleneck-based Gradient (IBG) explanation framework for ABSA. This framework leverages an information bottleneck to refine word embeddings into a concise intrinsic dimension, maintaining essential features and omitting unrelated information. Comprehensive tests show that our IBG approach considerably improves both the models’ performance and the explanations’ clarity by identifying sentiment-aware features.

A Regularization-based Transfer Learning Method for Information Extraction via Instructed Graph Decoder
Kedi Chen | Jie Zhou | Qin Chen | Shunyu Liu | Liang He
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Information extraction (IE) aims to extract complex structured information from the text. Numerous datasets have been constructed for various IE tasks, leading to time-consuming and labor-intensive data annotations. Nevertheless, most prevailing methods focus on training task-specific models, while the common knowledge among different IE tasks is not explicitly modeled. Moreover, the same phrase may have inconsistent labels in different tasks, which poses a big challenge for knowledge transfer using a unified model. In this study, we propose a regularization-based transfer learning method for IE (TIE) via an instructed graph decoder. Specifically, we first construct an instruction pool for datasets from all well-known IE tasks, and then present an instructed graph decoder, which decodes various complex structures into a graph uniformly based on corresponding instructions. In this way, the common knowledge shared with existing datasets can be learned and transferred to a new dataset with new labels. Furthermore, to alleviate the label inconsistency problem among various IE tasks, we introduce a task-specific regularization strategy, which does not update the gradients of two tasks with ‘opposite direction’. We conduct extensive experiments on 12 datasets spanning four IE tasks, and the results demonstrate the great advantages of our proposed method.

MGCL: Multi-Granularity Clue Learning for Emotion-Cause Pair Extraction via Cross-Grained Knowledge Distillation
Yang Yu | Xin Alex Lin | Changqun Li | Shizhou Huang | Liang He
Findings of the Association for Computational Linguistics: EMNLP 2024

Emotion-cause pair extraction (ECPE) aims to identify emotion clauses and their corresponding cause clauses within a document. Traditional methods often rely on coarse-grained clause-level annotations, which can overlook valuable fine-grained clues. To address this issue, we propose Multi-Granularity Clue Learning (MGCL), a novel approach designed to capture fine-grained emotion-cause clues from a weakly-supervised perspective efficiently. In MGCL, a teacher model is leveraged to give sub-clause clues without needing fine-grained annotated labels and guides a student model to identify clause-level emotion-cause pairs. Furthermore, we explore domain-invariant extra-clause clues under the teacher model’s advice to enhance the learning process. Experimental results on the benchmark dataset demonstrate that our method achieves state-of-the-art performance while offering improved interpretability.

DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models
Kedi Chen | Qin Chen | Jie Zhou | He Yishen | Liang He
Findings of the Association for Computational Linguistics: EMNLP 2024

Though large language models (LLMs) achieve significant success in recent years, the hallucination issue remains a challenge, and numerous benchmarks are proposed for hallucination detection. Nevertheless, some of these benchmarks are not naturally generated by LLMs but are intentionally induced. Also, many merely focus on the factuality hallucination while ignoring the faithfulness hallucination. Additionally, although dialogue pattern is more widely utilized in the era of LLMs, current benchmarks only concentrate on sentence-level and passage-level hallucination. In this study, we propose DiaHalu, the first dedicated dialogue-level hallucination evaluation benchmark for LLMs to our knowledge. Initially, we integrate the collected topics into system prompts and facilitate a dialogue between two LLMs. Subsequently, we manually modify the contents that do not adhere to human language conventions and then have LLMs re-generate, simulating authentic human-machine interaction scenarios. Finally, professional scholars annotate all the samples in the dataset. DiaHalu covers four common multi-turn dialogue domains and five hallucination subtypes, extended from factuality and faithfulness hallucination. Experiments through some well-known LLMs and detection methods on the dataset show that DiaHalu is a challenging benchmark, holding significant value for further research.

Hypernetwork-Assisted Parameter-Efficient Fine-Tuning with Meta-Knowledge Distillation for Domain Knowledge Disentanglement
Changqun Li | Linlin Wang | Xin Lin | Shizhou Huang | Liang He
Findings of the Association for Computational Linguistics: NAACL 2024

Domain adaptation from labeled source domains to the target domain is important in practical summarization scenarios. However, the key challenge is domain knowledge disentanglement. In this work, we explore how to disentangle domain-invariant knowledge from source domains while learning specific knowledge of the target domain. Specifically, we propose a hypernetwork-assisted encoder-decoder architecture with parameter-efficient fine-tuning. It leverages a hypernetwork instruction learning module to generate domain-specific parameters from the encoded inputs accompanied by task-related instruction. Further, to better disentangle and transfer knowledge from source domains to the target domain, we introduce a meta-knowledge distillation strategy to build a meta-teacher model that captures domain-invariant knowledge across multiple domains and use it to transfer knowledge to students. Experiments on three dialogue summarization datasets show the effectiveness of the proposed model. Human evaluations also show the superiority of our model with regard to the summary generation quality.

Boosting Large Language Models with Continual Learning for Aspect-based Sentiment Analysis
Xuanwen Ding | Jie Zhou | Liang Dou | Qin Chen | Yuanbin Wu | Arlene Chen | Liang He
Findings of the Association for Computational Linguistics: EMNLP 2024

Aspect-based sentiment analysis (ABSA) is an important subtask of sentiment analysis, which aims to extract the aspects and predict their sentiments. Most existing studies focus on improving the performance of the target domain by fine-tuning domain-specific models (trained on source domains) based on the target domain dataset. Few works propose continual learning tasks for ABSA, which aim to learn the target domain’s ability while maintaining the history domains’ abilities. In this paper, we propose a Large Language Model-based Continual Learning (LLM-CL) model for ABSA. First, we design a domain knowledge decoupling module to learn a domain-invariant adapter and separate domain-variant adapters dependently with an orthogonal constraint. Then, we introduce a domain knowledge warmup strategy to align the representation between domain-invariant and domain-variant knowledge. In the test phase, we index the corresponding domain-variant knowledge via domain positioning to not require each sample’s domain ID. Extensive experiments over 19 datasets indicate that our LLM-CL model obtains new state-of-the-art performance.

Let’s Rectify Step by Step: Improving Aspect-based Sentiment Analysis with Diffusion Models
Shunyu Liu | Jie Zhou | Qunxi Zhu | Qin Chen | Qingchun Bai | Jun Xiao | Liang He
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Aspect-Based Sentiment Analysis (ABSA) stands as a crucial task in predicting the sentiment polarity associated with identified aspects within text. However, a notable challenge in ABSA lies in precisely determining the aspects’ boundaries (start and end indices), especially for long ones, due to users’ colloquial expressions. We propose DiffusionABSA, a novel diffusion model tailored for ABSA, which extracts the aspects progressively step by step. Particularly, DiffusionABSA gradually adds noise to the aspect terms in the training process, subsequently learning a denoising process that progressively restores these terms in a reverse manner. To estimate the boundaries, we design a denoising neural network enhanced by a syntax-aware temporal attention mechanism to chronologically capture the interplay between aspects and surrounding text. Empirical evaluations conducted on eight benchmark datasets underscore the compelling advantages offered by DiffusionABSA when compared against robust baseline models. Our code is publicly available at https://github.com/Qlb6x/DiffusionABSA.

C-LLM: Learn to Check Chinese Spelling Errors Character by Character
Kunting Li | Yong Hu | Liang He | Fandong Meng | Jie Zhou
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Chinese Spell Checking (CSC) aims to detect and correct spelling errors in sentences. Despite Large Language Models (LLMs) exhibit robust capabilities and are widely applied in various tasks, their performance on CSC is often unsatisfactory. We find that LLMs fail to meet the Chinese character-level constraints of the CSC task, namely equal length and phonetic similarity, leading to a performance bottleneck. Further analysis reveals that this issue stems from the granularity of tokenization, as current mixed character-word tokenization struggles to satisfy these character-level constraints. To address this issue, we propose C-LLM, a Large Language Model-based Chinese Spell Checking method that learns to check errors Character by Character. Character-level tokenization enables the model to learn character-level alignment, effectively mitigating issues related to character-level constraints. Furthermore, CSC is simplified to replication-dominated and substitution-supplemented tasks. Experiments on two CSC benchmarks demonstrate that C-LLM achieves a 2.1% enhancement in general scenarios and a significant 12% improvement in vertical domain scenarios compared to existing methods, establishing state-of-the-art performance.

MixRED: A Mix-lingual Relation Extraction Dataset
Lingxing Kong | Yougang Chu | Zheng Ma | Jianbing Zhang | Liang He | Jiajun Chen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Relation extraction is a critical task in the field of natural language processing with numerous real-world applications. Existing research primarily focuses on monolingual relation extraction or cross-lingual enhancement for relation extraction. Yet, there remains a significant gap in understanding relation extraction in the mix-lingual (or code-switching) scenario, where individuals intermix contents from different languages within sentences, generating mix-lingual content. Due to the lack of a dedicated dataset, the effectiveness of existing relation extraction models in such a scenario is largely unexplored. To address this issue, we introduce a novel task of considering relation extraction in the mix-lingual scenario called MixRE and constructing the human-annotated dataset MixRED to support this task. In addition to constructing the MixRED dataset, we evaluate both state-of-the-art supervised models and large language models (LLMs) on MixRED, revealing their respective advantages and limitations in the mix-lingual scenario. Furthermore, we delve into factors influencing model performance within the MixRE task and uncover promising directions for enhancing the performance of both supervised models and LLMs in this novel task.

Vanessa: Visual Connotation and Aesthetic Attributes Understanding Network for Multimodal Aspect-based Sentiment Analysis
Luwei Xiao | Rui Mao | Xulang Zhang | Liang He | Erik Cambria
Findings of the Association for Computational Linguistics: EMNLP 2024

Prevailing research concentrates on superficial features or descriptions of images, revealing a significant gap in the systematic exploration of their connotative and aesthetic attributes. Furthermore, the use of cross-modal relation detection modules to eliminate noise from comprehensive image representations leads to the omission of subtle contextual information. In this paper, we present a Visual Connotation and Aesthetic Attributes Understanding Network (Vanessa) for Multimodal Aspect-based Sentiment Analysis. Concretely, Vanessa incorporates a Multi-Aesthetic Attributes Aggregation (MA3) module that models intra- and inter-dependencies among bi-modal representations as well as emotion-laden aesthetic attributes. Moreover, we devise a self-supervised contrastive learning framework to explore the pairwise relevance between images and text via the Gaussian distribution of their CLIP scores. By dynamically clustering and merging multi-modal tokens, Vanessa effectively captures both implicit and explicit sentimental cues. Extensive experiments on widely adopted two benchmarks verify Vanessa’s effectiveness.

2023

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Jiabo Ye | Anwen Hu | Haiyang Xu | Qinghao Ye | Ming Yan | Guohai Xu | Chenliang Li | Junfeng Tian | Qi Qian | Ji Zhang | Qin Jin | Liang He | Xin Lin | Fei Huang
Findings of the Association for Computational Linguistics: EMNLP 2023

Text is ubiquitous in our visual world, conveying crucial information, such as in documents, websites, and everyday photographs. In this work, we propose UReader, a first exploration of universal OCR-free visually-situated language understanding based on the Multimodal Large Language Model (MLLM). By leveraging the shallow text recognition ability of the MLLM, we only finetuned 1.2% parameters and the training cost is much lower than previous work following domain-specific pretraining and finetuning paradigms. Concretely, UReader is jointly finetuned on a wide range of Visually-situated Language Understanding tasks via a unified instruction format. To enhance the visual text and semantic understanding, we further apply two auxiliary tasks with the same format, namely text reading and key points generation tasks. We design a shape-adaptive cropping module before the encoder-decoder architecture of MLLM to leverage the frozen low-resolution vision encoder for processing high-resolution images. Without downstream finetuning, our single model achieves state-of-the-art ocr-free performance in 8 out of 10 visually-situated language understanding tasks, across 5 domains: documents, tables, charts, natural images, and webpage screenshots. Codes and instruction-tuning datasets will be released.

SKD-NER: Continual Named Entity Recognition via Span-based Knowledge Distillation with Reinforcement Learning
Yi Chen | Liang He
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Continual learning for named entity recognition (CL-NER) aims to enable models to continuously learn new entity types while retaining the ability to recognize previously learned ones. However, the current strategies fall short of effectively addressing the catastrophic forgetting of previously learned entity types. To tackle this issue, we propose the SKD-NER model, an efficient continual learning NER model based on the span-based approach, which innovatively incorporates reinforcement learning strategies to enhance the model’s ability against catastrophic forgetting. Specifically, we leverage knowledge distillation (KD) to retain memory and employ reinforcement learning strategies during the KD process to optimize the soft labeling and distillation losses generated by the teacher model to effectively prevent catastrophic forgetting during continual learning. This approach effectively prevents or mitigates catastrophic forgetting during continuous learning, allowing the model to retain previously learned knowledge while acquiring new knowledge. Our experiments on two benchmark datasets demonstrate that our model significantly improves the performance of the CL-NER task, outperforming state-of-the-art methods.

2022

Curriculum Prompt Learning with Self-Training for Abstractive Dialogue Summarization
Changqun Li | Linlin Wang | Xin Lin | Gerard de Melo | Liang He
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Succinctly summarizing dialogue is a task of growing interest, but inherent challenges, such as insufficient training data and low information density impede our ability to train abstractive models. In this work, we propose a novel curriculum-based prompt learning method with self-training to address these problems. Specifically, prompts are learned using a curriculum learning strategy that gradually increases the degree of prompt perturbation, thereby improving the dialogue understanding and modeling capabilities of our model. Unlabeled dialogue is incorporated by means of self-training so as to reduce the dependency on labeled data. We further investigate topic-aware prompts to better plan for the generation of summaries. Experiments confirm that our model substantially outperforms strong baselines and achieves new state-of-the-art results on the AMI and ICSI datasets. Human evaluations also show the superiority of our model with regard to the summary generation quality.

Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation
ZeFeng Cai | Linlin Wang | Gerard de Melo | Fei Sun | Liang He
Findings of the Association for Computational Linguistics: ACL 2022

Generating explanations for recommender systems is essential for improving their transparency, as users often wish to understand the reason for receiving a specified recommendation. Previous methods mainly focus on improving the generation quality, but often produce generic explanations that fail to incorporate user and item specific details. To resolve this problem, we present Multi-Scale Distribution Deep Variational Autoencoders (MVAE).These are deep hierarchical VAEs with a prior network that eliminates noise while retaining meaningful signals in the input, coupled with a recognition network serving as the source of information to guide the learning of the prior network. Further, the Multi-scale distribution Learning Framework (MLF) along with a Target Tracking Kullback-Leibler divergence (TKL) mechanism are proposed to employ multi KL divergences at different scales for more effective learning. Extensive empirical experiments demonstrate that our methods can generate explanations with concrete input-specific contents.

A Multi-Format Transfer Learning Model for Event Argument Extraction via Variational Information Bottleneck
Jie Zhou | Qi Zhang | Qin Chen | Qi Zhang | Liang He | Xuanjing Huang
Proceedings of the 29th International Conference on Computational Linguistics

Event argument extraction (EAE) aims to extract arguments with given roles from texts, which have been widely studied in natural language processing. Most previous works have achieved good performance in specific EAE datasets with dedicated neural architectures. Whereas, these architectures are usually difficult to adapt to new datasets/scenarios with various annotation schemas or formats. Furthermore, they rely on large-scale labeled data for training, which is unavailable due to the high labelling cost in most cases. In this paper, we propose a multi-format transfer learning model with variational information bottleneck, which makes use of the information especially the common knowledge in existing datasets for EAE in new datasets. Specifically, we introduce a shared-specific prompt framework to learn both format-shared and format-specific knowledge from datasets with different formats. In order to further absorb the common knowledge for EAE and eliminate the irrelevant noise, we integrate variational information bottleneck into our architecture to refine the shared representation. We conduct extensive experiments on three benchmark datasets, and obtain new state-of-the-art performance on EAE.

ECNU_ICA at SemEval-2022 Task 10: A Simple and Unified Model for Monolingual and Crosslingual Structured Sentiment Analysis
Qi Zhang | Jie Zhou | Qin Chen | Qingchun Bai | Jun Xiao | Liang He
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Sentiment analysis is increasingly viewed as a vital task both from an academic and a commercial standpoint. In this paper, we focus on the structured sentiment analysis task that is released on SemEval-2022 Task 10. The task aims to extract the structured sentiment information (e.g., holder, target, expression and sentiment polarity) in a text. We propose a simple and unified model for both the monolingual and crosslingual structured sentiment analysis tasks. We translate this task into an event extraction task by regrading the expression as the trigger word and the other elements as the arguments of the event. Particularly, we first extract the expression by judging its start and end indices. Then, to consider the expression, we design a conditional layer normalization algorithm to extract the holder and target based on the extracted expression. Finally, we infer the sentiment polarity based on the extracted structured information. Pre-trained language models are utilized to obtain the text representation. We conduct the experiments on seven datasets in five languages. It attracted 233 submissions in monolingual subtask and crosslingual subtask from 32 teams. Finally, we obtain the top 5 place on crosslingual tasks.

An Information Minimization Based Contrastive Learning Model for Unsupervised Sentence Embeddings Learning
Shaobin Chen | Jie Zhou | Yuling Sun | Liang He
Proceedings of the 29th International Conference on Computational Linguistics

Unsupervised sentence embeddings learning has been recently dominated by contrastive learning methods (e.g., SimCSE), which keep positive pairs similar and push negative pairs apart. The contrast operation aims to keep as much information as possible by maximizing the mutual information between positive instances, which leads to redundant information in sentence embedding. To address this problem, we present an information minimization based contrastive learning InforMin-CL model to retain the useful information and discard the redundant information by maximizing the mutual information and minimizing the information entropy between positive instances meanwhile for unsupervised sentence representation learning. Specifically, we find that information minimization can be achieved by simple contrast and reconstruction objectives. The reconstruction operation reconstitutes the positive instance via the other positive instance to minimize the information entropy between positive instances. We evaluate our model on fourteen downstream tasks, including both supervised and unsupervised (semantic textual similarity) tasks. Extensive experimental results show that our InforMin-CL obtains a state-of-the-art performance.

2021

Attending via both Fine-tuning and Compressing
Jie Zhou | Yuanbin Wu | Qin Chen | Xuanjing Huang | Liang He
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

ECNUICA at SemEval-2021 Task 11: Rule based Information Extraction Pipeline
Jiaju Lin | Jing Ling | Zhiwei Wang | Jiawei Liu | Qin Chen | Liang He
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper presents our endeavor for solving task11, NLPContributionGraph, of SemEval-2021. The purpose of the task was to extract triples from a paper in the Nature Language Processing field for constructing an Open Research Knowledge Graph. The task includes three sub-tasks: detecting the contribution sentences in papers, identifying scientific terms and predicate phrases from the contribution sentences; and inferring triples in the form of (subject, predicate, object) as statements for Knowledge Graph building. In this paper, we apply an ensemble of various fine-tuned pre-trained language models (PLM) for tasks one and two. In addition, self-training methods are adopted for tackling the shortage of annotated data. For the third task, rather than using classic neural open information extraction (OIE) architectures, we generate potential triples via manually designed rules and develop a binary classifier to differentiate positive ones from others. The quantitative results show that we obtain the 4th, 2nd, and 2nd rank in three evaluation phases.

KERS: A Knowledge-Enhanced Framework for Recommendation Dialog Systems with Multiple Subgoals
Jun Zhang | Yan Yang | Chencai Chen | Liang He | Zhou Yu
Findings of the Association for Computational Linguistics: EMNLP 2021

Recommendation dialogs require the system to build a social bond with users to gain trust and develop affinity in order to increase the chance of a successful recommendation. It is beneficial to divide up, such conversations with multiple subgoals (such as social chat, question answering, recommendation, etc.), so that the system can retrieve appropriate knowledge with better accuracy under different subgoals. In this paper, we propose a unified framework for common knowledge-based multi-subgoal dialog: knowledge-enhanced multi-subgoal driven recommender system (KERS). We first predict a sequence of subgoals and use them to guide the dialog model to select knowledge from a sub-set of existing knowledge graph. We then propose three new mechanisms to filter noisy knowledge and to enhance the inclusion of cleaned knowledge in the dialog response generation process. Experiments show that our method obtains state-of-the-art results on DuRecDial dataset in both automatic and human evaluation.

ECNU_ICA_1 SemEval-2021 Task 4: Leveraging Knowledge-enhanced Graph Attention Networks for Reading Comprehension of Abstract Meaning
Pingsheng Liu | Linlin Wang | Qian Zhao | Hao Chen | Yuxi Feng | Xin Lin | Liang He
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper describes our system for SemEval-2021 Task 4: Reading Comprehension of Abstract Meaning. To accomplish this task, we utilize the Knowledge-Enhanced Graph Attention Network (KEGAT) architecture with a novel semantic space transformation strategy. It leverages heterogeneous knowledge to learn adequate evidences, and seeks for an effective semantic space of abstract concepts to better improve the ability of a machine in understanding the abstract meaning of natural language. Experimental results show that our system achieves strong performance on this task in terms of both imperceptibility and nonspecificity.

Is “hot pizza” Positive or Negative? Mining Target-aware Sentiment Lexicons
Jie Zhou | Yuanbin Wu | Changzhi Sun | Liang He
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Modelling a word’s polarity in different contexts is a key task in sentiment analysis. Previous works mainly focus on domain dependencies, and assume words’ sentiments are invariant within a specific domain. In this paper, we relax this assumption by binding a word’s sentiment to its collocation words instead of domain labels. This finer view of sentiment contexts is particularly useful for identifying commonsense sentiments expressed in neural words such as “big” and “long”. Given a target (e.g., an aspect), we propose an effective “perturb-and-see” method to extract sentiment words modifying it from large-scale datasets. The reliability of the obtained target-aware sentiment lexicons is extensively evaluated both manually and automatically. We also show that a simple application of the lexicon is able to achieve highly competitive performances on the unsupervised opinion relation extraction task.

2020

SEEK: Segmented Embedding of Knowledge Graphs
Wentao Xu | Shun Zheng | Liang He | Bin Shao | Jian Yin | Tie-Yan Liu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In recent years, knowledge graph embedding becomes a pretty hot research topic of artificial intelligence and plays increasingly vital roles in various downstream applications, such as recommendation and question answering. However, existing methods for knowledge graph embedding can not make a proper trade-off between the model complexity and the model expressiveness, which makes them still far from satisfactory. To mitigate this problem, we propose a lightweight modeling framework that can achieve highly competitive relational expressiveness without increasing the model complexity. Our framework focuses on the design of scoring functions and highlights two critical characteristics: 1) facilitating sufficient feature interactions; 2) preserving both symmetry and antisymmetry properties of relations. It is noteworthy that owing to the general and elegant design of scoring functions, our framework can incorporate many famous existing methods as special cases. Moreover, extensive experiments on public benchmarks demonstrate the efficiency and effectiveness of our framework. Source codes and data can be found at https://github.com/Wentao-Xu/SEEK.

ECNU-SenseMaker at SemEval-2020 Task 4: Leveraging Heterogeneous Knowledge Resources for Commonsense Validation and Explanation
Qian Zhao | Siyu Tao | Jie Zhou | Linlin Wang | Xin Lin | Liang He
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes our system for SemEval-2020 Task 4: Commonsense Validation and Explanation (Wang et al., 2020). We propose a novel Knowledge-enhanced Graph Attention Network (KEGAT) architecture for this task, leveraging heterogeneous knowledge from both the structured knowledge base (i.e. ConceptNet) and unstructured text to better improve the ability of a machine in commonsense understanding. This model has a powerful commonsense inference capability via utilizing suitable commonsense incorporation methods and upgraded data augmentation techniques. Besides, an internal sharing mechanism is cooperated to prohibit our model from insufficient and excessive reasoning for commonsense. As a result, this model performs quite well in both validation and explanation. For instance, it achieves state-of-the-art accuracy in the subtask called Commonsense Explanation (Multi-Choice). We officially name the system as ECNU-SenseMaker. Code is publicly available at https://github.com/ECNU-ICA/ECNU-SenseMaker.

SAS: Dialogue State Tracking via Slot Attention and Slot Information Sharing
Jiaying Hu | Yan Yang | Chencai Chen | Liang He | Zhou Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Dialogue state tracker is responsible for inferring user intentions through dialogue history. Previous methods have difficulties in handling dialogues with long interaction context, due to the excessive information. We propose a Dialogue State Tracker with Slot Attention and Slot Information Sharing (SAS) to reduce redundant information’s interference and improve long dialogue context tracking. Specially, we first apply a Slot Attention to learn a set of slot-specific features from the original dialogue and then integrate them using a slot information sharing module. Our model yields a significantly improved performance compared to previous state-of the-art models on the MultiWOZ dataset.

Integrating BERT and Score-based Feature Gates for Chinese Grammatical Error Diagnosis
Yongchang Cao | Liang He | Robert Ridley | Xinyu Dai
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications

This paper describes our proposed model for the Chinese Grammatical Error Diagnosis (CGED) task in NLPTEA2020. The goal of CGED is to use natural language processing techniques to automatically diagnose Chinese grammatical errors in sentences. To this end, we design and implement a CGED model named BERT with Score-feature Gates Error Diagnoser (BSGED), which is based on the BERT model, Bidirectional Long Short-Term Memory (BiLSTM) and conditional random field (CRF). In order to address the problem of losing partial-order relationships when embedding continuous feature items as with previous works, we propose a gating mechanism for integrating continuous feature items, which effectively retains the partial-order relationships between feature items. We perform LSTM processing on the encoding result of the BERT model, and further extract the sequence features. In the final test-set evaluation, we obtained the highest F1 score at the detection level and are among the top 3 F1 scores at the identification level.

SentiX: A Sentiment-Aware Pre-Trained Model for Cross-Domain Sentiment Analysis
Jie Zhou | Junfeng Tian | Rui Wang | Yuanbin Wu | Wenming Xiao | Liang He
Proceedings of the 28th International Conference on Computational Linguistics

Pre-trained language models have been widely applied to cross-domain NLP tasks like sentiment analysis, achieving state-of-the-art performance. However, due to the variety of users’ emotional expressions across domains, fine-tuning the pre-trained models on the source domain tends to overfit, leading to inferior results on the target domain. In this paper, we pre-train a sentiment-aware language model (SentiX) via domain-invariant sentiment knowledge from large-scale review datasets, and utilize it for cross-domain sentiment analysis task without fine-tuning. We propose several pre-training tasks based on existing lexicons and annotations at both token and sentence levels, such as emoticons, sentiment words, and ratings, without human interference. A series of experiments are conducted and the results indicate the great advantages of our model. We obtain new state-of-the-art results in all the cross-domain sentiment analysis tasks, and our proposed SentiX can be trained with only 1% samples (18 samples) and it achieves better performance than BERT with 90% samples.

2019

Exploiting Noisy Data in Distant Supervision Relation Classification
Kaijia Yang | Liang He | Xin-yu Dai | Shujian Huang | Jiajun Chen
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Distant supervision has obtained great progress on relation classification task. However, it still suffers from noisy labeling problem. Different from previous works that underutilize noisy data which inherently characterize the property of classification, in this paper, we propose RCEND, a novel framework to enhance Relation Classification by Exploiting Noisy Data. First, an instance discriminator with reinforcement learning is designed to split the noisy data into correctly labeled data and incorrectly labeled data. Second, we learn a robust relation classifier in semi-supervised learning way, whereby the correctly and incorrectly labeled data are treated as labeled and unlabeled data respectively. The experimental results show that our method outperforms the state-of-the-art models.

2018

EmojiIt at SemEval-2018 Task 2: An Effective Attention-Based Recurrent Neural Network Model for Emoji Prediction with Characters Gated Words
Shiyun Chen | Maoquan Wang | Liang He
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper presents our single model to Subtask 1 of SemEval 2018 Task 2: Emoji Prediction in English. In order to predict the emoji that may be contained in a tweet, the basic model we use is an attention-based recurrent neural network which has achieved satisfactory performs in Natural Language processing. Considering the text comes from social media, it contains many discrepant abbreviations and online terms, we also combine word-level and character-level word vector embedding to better handling the words not appear in the vocabulary. Our single model1 achieved 29.50% Macro F-score in test data and ranks 9th among 48 teams.

2006

Chinese Named Entity Recognition with a Multi-Phase Model
Junsheng Zhou | Liang He | Xinyu Dai | Jiajun Chen
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

Co-authors

Xuan-Jing Huang (黄萱菁) 2

Shizhou Huang 2

Zetian Ouyang 2

Xiaoling Wang 2

Yongchang Cao 1

Zhenxiao Cheng 1

Shujian Huang (书剑黄) 1

Lingxing Kong 1

Pingsheng Liu 1

Robert Ridley 1

Wenzhong Yang 1

Jianbing Zhang 1

Shunfan Zheng 1

Junsheng Zhou (周俊生) 1

Venues