Liping Jing (景丽萍) - ACL Anthology

Liping Jing

Also published as: 丽萍景

2025

pdf bib abs
Continual Gradient Low-Rank Projection Fine-Tuning for LLMs
Chenxu Wang | Yilin Lyu | Zicheng Sun | Liping Jing
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Continual fine-tuning of Large Language Models (LLMs) is hampered by the trade-off between efficiency and expressiveness. Low-Rank Adaptation (LoRA) offers efficiency but constrains the model’s ability to learn new tasks and transfer knowledge due to its low-rank nature and reliance on explicit parameter constraints. We propose GORP ( ̲Gradient L ̲Ow ̲Rank ̲Projection) for Continual Learning, a novel training strategy that overcomes these limitations by synergistically combining full and low-rank parameters and jointly updating within a unified low-rank gradient subspace. GORP expands the optimization space while preserving efficiency and mitigating catastrophic forgetting. Extensive experiments on continual learning benchmarks demonstrate GORP’s superior performance compared to existing state-of-the-art approaches. Code is available at https://github.com/Wcxwcxw/GORP.

Recent progress in large language models (LLMs) has opened new possibilities for mental health support, yet current approaches lack realism in simulating specialized psychotherapy and fail to capture therapeutic progression over time. Narrative therapy, which helps individuals transform problematic life stories into empowering alternatives, remains underutilized due to limited access and social stigma. We address these limitations through a comprehensive framework with two core components. First, **INT** (Interactive Narrative Therapist) simulates expert narrative therapists by planning therapeutic stages, guiding reflection levels, and generating contextually appropriate responses through retrieval-augmentation. Second, **IMA** (Innovative Moment Assessment) provides a therapy-centric evaluation method that quantifies effectiveness by tracking “Innovative Moments” (IMs), critical narrative shifts in client speech signaling therapy progress. Experimental results on 260 simulated clients and 230 human participants reveal that **INT** consistently outperforms standard methods in therapeutic quality and depth. We further demonstrate the effectiveness of **INT** in synthesizing high-quality support conversations to facilitate social applications.

2024

pdf bib abs
基于提示工程和思维链的提示词构造
Yun Luo (罗允) | Yi Feng (冯毅) | Liping Jing (景丽萍)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“儿童故事常识推理与寓意理解评测任务旨在从常识推理和寓意理解两个任务多角度评价中文预训练语言模型和大型语言模型的常识推理和故事理解能力,这考察了模型的常识储备能力以及对文本内容的深入理解能力,因此极具挑战性。随着大语言模型的发展,其卓越的指令跟随能力显著提升了自然语言处理任务的效率和效果。然而,这也对提示词的设计提出了更高的要求,因为提示词的质量直接影响了大模型的表现和预测结果的准确性。因此,设计有效的提示词变得尤为重要,不仅需要理解任务的具体需求,还要具备对语言模型的深入认识和灵活运用能力。本文针对儿童故事常识推理与寓意理解评测赛道一的两个任务,提出了一种基于提示工程的提示词构造方法。首先,我们提出了一种基于融合提示工程、思维链的通用提示词构建框架;然后,我们针对具体的任务调整对应的提示词模板;最后,结合语言模型使用这些提示词进行结果预测。在本次评测中,我们的方法在赛道一的封闭数据条件下获得了第三名的成绩,这验证了我们方法的有效性,并展示了其在自然语言理解领域的应用潜力。”

In noisy label learning, instance selection based on small-loss criteria has been proven to be highly effective. However, in the case of noisy multi-label text classification (NMLTC), the presence of noise is not limited to the instance-level but extends to the (instance-label) pair-level.This gives rise to two main challenges.(1) The loss information at the pair-level fails to capture the variations between instances. (2) There are two types of noise at the pair-level: false positives and false negatives. Identifying false negatives from a large pool of negative pairs presents an exceedingly difficult task. To tackle these issues, we propose a novel approach called instance-label pair correction (iLaCo), which aims to address the problem of noisy pair selection and correction in NMLTC tasks.Specifically, we first introduce a holistic selection metric that identifies noisy pairs by simultaneously considering global loss information and instance-specific ranking information.Secondly, we employ a filter guided by label correlation to focus exclusively on negative pairs with label relevance. This filter significantly reduces the difficulty of identifying false negatives.Experimental analysis indicates that our framework effectively corrects noisy pairs in NMLTC datasets, leading to a significant improvement in model performance.

pdf bib abs
Match More, Extract Better! Hybrid Matching Model for Open Domain Web Keyphrase Extraction
Mingyang Song | Liping Jing | Yi Feng
Findings of the Association for Computational Linguistics: ACL 2024

Keyphrase extraction aims to automatically extract salient phrases representing the critical information in the source document. Identifying salient phrases is challenging because there is a lot of noisy information in the document, leading to wrong extraction. To address this issue, in this paper, we propose a hybrid matching model for keyphrase extraction, which combines representation-focused and interaction-based matching modules into a unified framework for improving the performance of the keyphrase extraction task. Specifically, HybridMatch comprises (1) a PLM-based Siamese encoder component that represents both candidate phrases and documents, (2) an interaction-focused matching (IM) component that estimates word matches between candidate phrases and the corresponding document at the word level, and (3) a representation-focused matching (RM) component captures context-aware semantic relatedness of each candidate keyphrase at the phrase level. Extensive experimental results on the OpenKP dataset demonstrate that the performance of the proposed model HybridMatch outperforms the recent state-of-the-art keyphrase extraction baselines. Furthermore, we discuss the performance of large language models in keyphrase extraction based on recent studies and our experiments.

pdf bib abs
Enhancing Multi-Label Text Classification under Label-Dependent Noise: A Label-Specific Denoising Framework
Pengyu Xu | Liping Jing | Jian Yu
Findings of the Association for Computational Linguistics: EMNLP 2024

Recent advancements in noisy multi-label text classification have primarily relied on the class-conditional noise (CCN) assumption, which treats each label independently undergoing label flipping to generate noisy labels. However, in real-world scenarios, noisy labels often exhibit dependencies with true labels. In this study, we validate through hypothesis testing that real-world datasets are unlikely to adhere to the CCN assumption, indicating that label noise is dependent on the labels. To address this, we introduce a label-specific denoising framework designed to counteract label-dependent noise. The framework initially presents a holistic selection metric that evaluates noisy labels by concurrently considering loss information, ranking information, and feature centroid. Subsequently, it identifies and corrects noisy labels individually for each label category in a fine-grained manner. Extensive experiments on benchmark datasets demonstrate the effectiveness of our method under both synthetic and real-world noise conditions, significantly improving performance over existing state-of-the-art models.

2023

pdf bib abs
HyperRank: Hyperbolic Ranking Model for Unsupervised Keyphrase Extraction
Mingyang Song | Huafeng Liu | Liping Jing
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Given the exponential growth in the number of documents on the web in recent years, there is an increasing demand for accurate models to extract keyphrases from such documents. Keyphrase extraction is the task of automatically identifying representative keyphrases from the source document. Typically, candidate keyphrases exhibit latent hierarchical structures embedded with intricate syntactic and semantic information. Moreover, the relationships between candidate keyphrases and the document also form hierarchical structures. Therefore, it is essential to consider these latent hierarchical structures when extracting keyphrases. However, many recent unsupervised keyphrase extraction models overlook this aspect, resulting in incorrect keyphrase extraction. In this paper, we address this issue by proposing a new hyperbolic ranking model (HyperRank). HyperRank is designed to jointly model global and local context information for estimating the importance of each candidate keyphrase within the hyperbolic space, enabling accurate keyphrase extraction. Experimental results demonstrate that HyperRank significantly outperforms recent state-of-the-art baselines.

pdf bib abs
Mitigating Over-Generation for Unsupervised Keyphrase Extraction with Heterogeneous Centrality Detection
Mingyang Song | Pengyu Xu | Yi Feng | Huafeng Liu | Liping Jing
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Over-generation errors occur when a keyphrase extraction model correctly determines a candidate keyphrase as a keyphrase because it contains a word that frequently appears in the document but at the same time erroneously outputs other candidates as keyphrases because they contain the same word. To mitigate this issue, we propose a new heterogeneous centrality detection approach (CentralityRank), which extracts keyphrases by simultaneously identifying both implicit and explicit centrality within a heterogeneous graph as the importance score of each candidate. More specifically, CentralityRank detects centrality by taking full advantage of the content within the input document to construct graphs that encompass semantic nodes of varying granularity levels, not limited to just phrases. These additional nodes act as intermediaries between candidate keyphrases, enhancing cross-phrase relations. Furthermore, we introduce a novel adaptive boundary-aware regularization that can leverage the position information of candidate keyphrases, thus influencing the importance of candidate keyphrases. Extensive experimental results demonstrate the superiority of CentralityRank over recent state-of-the-art unsupervised keyphrase extraction baselines across three benchmark datasets.

pdf bib abs
A Survey on Recent Advances in Keyphrase Extraction from Pre-trained Language Models
Mingyang Song | Yi Feng | Liping Jing
Findings of the Association for Computational Linguistics: EACL 2023

Keyphrase Extraction (KE) is a critical component in Natural Language Processing (NLP) systems for selecting a set of phrases from the document that could summarize the important information discussed in the document. Typically, a keyphrase extraction system can significantly accelerate the speed of information retrieval and help people get first-hand information from a long document quickly and accurately. Specifically, keyphrases are capable of providing semantic metadata characterizing documents and producing an overview of the content of a document. In this paper, we introduce keyphrase extraction, present a review of the recent studies based on pre-trained language models, offer interesting insights on the different approaches, highlight open issues, and give a comparative experimental study of popular supervised as well as unsupervised techniques on several datasets. To encourage more instantiations, we release the related files mentioned in this paper.

pdf bib abs
Improving Embedding-based Unsupervised Keyphrase Extraction by Incorporating Structural Information
Mingyang Song | Huafeng Liu | Yi Feng | Liping Jing
Findings of the Association for Computational Linguistics: ACL 2023

Keyphrase extraction aims to extract a set of phrases with the central idea of the source document. In a structured document, there are certain locations (e.g., the title or the first sentence) where a keyphrase is most likely to appear. However, when extracting keyphrases from the document, most existing embedding-based unsupervised keyphrase extraction models ignore the indicative role of the highlights in certain locations, leading to wrong keyphrases extraction. In this paper, we propose a new Highlight-Guided Unsupervised Keyphrase Extraction model (HGUKE) to address the above issue. Specifically, HGUKE first models the phrase-document relevance via the highlights of the documents. Next, HGUKE calculates the cross-phrase relevance between all candidate phrases. Finally, HGUKE aggregates the above two relevance as the importance score of each candidate phrase to rank and extract keyphrases. The experimental results on three benchmarks demonstrate that HGUKE outperforms the state-of-the-art unsupervised keyphrase extraction baselines.

pdf bib abs
Unsupervised Keyphrase Extraction by Learning Neural Keyphrase Set Function
Mingyang Song | Haiyun Jiang | Lemao Liu | Shuming Shi | Liping Jing
Findings of the Association for Computational Linguistics: ACL 2023

We create a paradigm shift concerning building unsupervised keyphrase extraction systems in this paper. Instead of modeling the relevance between an individual candidate phrase and the document as in the commonly used framework, we formulate the unsupervised keyphrase extraction task as a document-set matching problem from a set-wise perspective, in which the document and the candidate set are globally matched in the semantic space to particularly take into account the interactions among all candidate phrases. Since it is intractable to exactly extract the keyphrase set by the matching function during the inference, we propose an approximate approach, which obtains the candidate subsets via a set extractor agent learned by reinforcement learning. Exhaustive experimental results demonstrate the effectiveness of our model, which outperforms the recent state-of-the-art unsupervised keyphrase extraction baselines by a large margin.

2022

pdf bib
Utilizing BERT Intermediate Layers for Unsupervised Keyphrase Extraction
Mingyang Song | Yi Feng | Liping Jing
Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)

pdf bib abs
Hyperbolic Relevance Matching for Neural Keyphrase Extraction
Mingyang Song | Yi Feng | Liping Jing
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Keyphrase extraction is a fundamental task in natural language processing that aims to extract a set of phrases with important information from a source document. Identifying important keyphrases is the central component of keyphrase extraction, and its main challenge is learning to represent information comprehensively and discriminate importance accurately. In this paper, to address the above issues, we design a new hyperbolic matching model (HyperMatch) to explore keyphrase extraction in hyperbolic space. Concretely, to represent information comprehensively, HyperMatch first takes advantage of the hidden representations in the middle layers of RoBERTa and integrates them as the word embeddings via an adaptive mixing layer to capture the hierarchical syntactic and semantic structures. Then, considering the latent structure information hidden in natural languages, HyperMatch embeds candidate phrases and documents in the same hyperbolic space via a hyperbolic phrase encoder and a hyperbolic document encoder. To discriminate importance accurately, HyperMatch estimates the importance of each candidate phrase by explicitly modeling the phrase-document relevance via the Poincaré distance and optimizes the whole model by minimizing the hyperbolic margin-based triplet loss. Extensive experiments are conducted on six benchmark datasets and demonstrate that HyperMatch outperforms the recent state-of-the-art baselines.

2021

pdf bib abs
StereoRel: Relational Triple Extraction from a Stereoscopic Perspective
Xuetao Tian | Liping Jing | Lu He | Feng Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Relational triple extraction is critical to understanding massive text corpora and constructing large-scale knowledge graph, which has attracted increasing research interest. However, existing studies still face some challenging issues, including information loss, error propagation and ignoring the interaction between entity and relation. To intuitively explore the above issues and address them, in this paper, we provide a revealing insight into relational triple extraction from a stereoscopic perspective, which rationalizes the occurrence of these issues and exposes the shortcomings of existing methods. Further, a novel model is proposed for relational triple extraction, which maps relational triples to a three-dimension (3-D) space and leverages three decoders to extract them, aimed at simultaneously handling the above issues. A series of experiments are conducted on five public datasets, demonstrating that the proposed model outperforms the recent advanced baselines.

pdf bib abs
Importance Estimation from Multiple Perspectives for Keyphrase Extraction
Mingyang Song | Liping Jing | Lin Xiao
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Keyphrase extraction is a fundamental task in Natural Language Processing, which usually contains two main parts: candidate keyphrase extraction and keyphrase importance estimation. From the view of human understanding documents, we typically measure the importance of phrase according to its syntactic accuracy, information saliency, and concept consistency simultaneously. However, most existing keyphrase extraction approaches only focus on the part of them, which leads to biased results. In this paper, we propose a new approach to estimate the importance of keyphrase from multiple perspectives (called as KIEMP) and further improve the performance of keyphrase extraction. Specifically, KIEMP estimates the importance of phrase with three modules: a chunking module to measure its syntactic accuracy, a ranking module to check its information saliency, and a matching module to judge the concept (i.e., topic) consistency between phrase and the whole document. These three modules are seamlessly jointed together via an end-to-end multi-task learning model, which is helpful for three parts to enhance each other and balance the effects of three perspectives. Experimental results on six benchmark datasets show that KIEMP outperforms the existing state-of-the-art keyphrase extraction approaches in most cases.

2020

pdf bib abs
Hyperbolic Capsule Networks for Multi-Label Classification
Boli Chen | Xin Huang | Lin Xiao | Liping Jing
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Although deep neural networks are effective at extracting high-level features, classification methods usually encode an input into a vector representation via simple feature aggregation operations (e.g. pooling). Such operations limit the performance. For instance, a multi-label document may contain several concepts. In this case, one vector can not sufficiently capture its salient and discriminative content. Thus, we propose Hyperbolic Capsule Networks (HyperCaps) for Multi-Label Classification (MLC), which have two merits. First, hyperbolic capsules are designed to capture fine-grained document information for each label, which has the ability to characterize complicated structures among labels and documents. Second, Hyperbolic Dynamic Routing (HDR) is introduced to aggregate hyperbolic capsules in a label-aware manner, so that the label-level discriminative information can be preserved along the depth of neural networks. To efficiently handle large-scale MLC datasets, we additionally present a new routing method to adaptively adjust the capsule number during routing. Extensive experiments are conducted on four benchmark datasets. Compared with the state-of-the-art methods, HyperCaps significantly improves the performance of MLC especially on tail labels.

2019

pdf bib abs
Label-Specific Document Representation for Multi-Label Text Classification
Lin Xiao | Xin Huang | Boli Chen | Liping Jing
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Multi-label text classification (MLTC) aims to tag most relevant labels for the given document. In this paper, we propose a Label-Specific Attention Network (LSAN) to learn a label-specific document representation. LSAN takes advantage of label semantic information to determine the semantic connection between labels and document for constructing label-specific document representation. Meanwhile, the self-attention mechanism is adopted to identify the label-specific document representation from document content information. In order to seamlessly integrate the above two parts, an adaptive fusion strategy is proposed, which can effectively output the comprehensive label-specific document representation to build multi-label text classifier. Extensive experimental results demonstrate that LSAN consistently outperforms the state-of-the-art methods on four different datasets, especially on the prediction of low-frequency labels. The code and hyper-parameter settings are released to facilitate other researchers.