Yunshi Lan - ACL Anthology

Yunshi Lan

2026

Unsupervised Text Style Transfer for Controllable Intensity
Shuhuan Gu | Wenbiao Tao | Xinchen Ma | Kangkang He | Ye Guo | Xiang Li | Yunshi Lan
Findings of the Association for Computational Linguistics: EACL 2026

Unsupervised Text Style Transfer (UTST) aims to build a system to transfer the stylistic properties of a given text without parallel text pairs.Compared with text transfer between style polarities, UTST for controllable intensity is more challenging due to the subtle differences in stylistic features across different intensity levels.Faced with the challenges posed by the lack of parallel data and the indistinguishability between adjacent intensity levels, we propose a SFT-then-PPO paradigm to fine-tune an LLM.We first fine-tune the LLM with synthesized parallel data.Then, we further train the LLM with PPO, where the rewards are elaborately designed for distinguishing the stylistic intensity in hierarchical levels.Both the global and local stylistic features are considered to formulate the reward functions.The experiments on two UTST benchmarks showcase that both rewards have their advantages and applying them to LLM fine-tuning can effectively improve the performance of an LLM backbone based on various evaluation metrics.Even for adjacent levels of intensity, we can still observe a noticeable stylistic difference among the generated text across these levels.

2025

Initializing and Retrofitting Key-Value Adaptors for Traceable Model Editing
Hanlun Zhu | Yunshi Lan | Xiang Li | Weining Qian
Findings of the Association for Computational Linguistics: ACL 2025

As the insight of knowledge storage in language models deepens, the ability to perform CRUD (Create, Read, Update, Delete) operations on language models becomes increasingly indispensable for satisfying the demands of managing rapidly updating knowledge. Considering the high cost of fine-tuning language models, model editing methods with low cost are usually required to manipulate models’ knowledge. The evidence suggests that modules carrying knowledge in a Transformer module are primarily the MLP blocks, thus we propose iReVa, a method that explicitly initializes and retrofits key-value pairs into MLP blocks to construct a new mapping of a piece of knowledge without damaging the irrelevant knowledge. In comparison to existing methods, iReVa reveals better interpretability and a stronger capacity for carrying traceable edits. Experiment results on a series of GPT series models show our prominent performance on edit success and generalization without influencing specificity. We also made the first attempt to conduct a knowledge withdrawal test of iReVa. Our codes are available at https://github.com/timberflow/iReVa.

Large Language Models are Good Annotators for Type-aware Data Augmentation in Grammatical Error Correction
Xinyuan Li | Yunshi Lan
Proceedings of the 31st International Conference on Computational Linguistics

Large Language Models (LLMs) have achieved outstanding performance across various NLP tasks. Grammatical Error Correction (GEC) is a task aiming at automatically correcting grammatical errors in text, but it encounters a severe shortage of annotated data. Researchers have tried to make full use of the generalization capabilities of LLMs and prompt them to correct erroneous sentences, which however results in unexpected over-correction issues. In this paper, we rethink the role of LLMs in GEC tasks and propose a method, namely TypeDA, considering LLMs as the annotators for type-aware data augmentation in GEC tasks. Different from the existing data augmentation methods, our method prevents in-distribution corruption and is able to generate sentences with multi-granularity error types. Our experiments verify that our method can generally improve the GEC performance of different backbone models with only a small amount of augmented data. Further analyses verify the high consistency and diversity of the pseudo data generated via our method.

VisCGEC: Benchmarking the Visual Chinese Grammatical Error Correction
Xiaoman Wang | Dan Yuan | Xin Liu | Yike Zhao | Xiaoxiao Zhang | Xizhi Chen | Yunshi Lan
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

UnifiedGEC: Integrating Grammatical Error Correction Approaches for Multi-languages with a Unified Framework
Yike Zhao | Xiaoman Wang | Yunshi Lan | Weining Qian
Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations

Grammatical Error Correction is an important research direction in NLP field. Although many models of different architectures and datasets across different languages have been developed to support the research, there is a lack of a comprehensive evaluation on these models, and different architectures make it hard for developers to implement these models on their own. To address this limitation, we present UnifiedGEC, the first open-source GEC-oriented toolkit, which consists of several core components and reusable modules. In UnifiedGEC, we integrate 5 widely-used GEC models and compare their performance on 7 datasets in different languages. Additionally, GEC-related modules such as data augmentation, prompt engineering are also deployed in it. Developers are allowed to implement new models, run and evaluate on existing benchmarks through our framework in a simple way. Code, documents and detailed results of UnifiedGEC are available at https://github.com/AnKate/UnifiedGEC.

SEAGraph: Unveiling the Whole Story of Paper Review Comments
Jianxiang Yu | Jiaqi Tan | Zichen Ding | Jiapeng Zhu | Jiahao Li | Yao Cheng | Qier Cui | Yunshi Lan | Yao Liu | Xiang Li
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Peer review, as a cornerstone of scientific research, ensures the integrity and quality of scholarly work by providing authors with objective feedback for refinement. However, in the traditional peer review process, authors often receive vague or insufficiently detailed feedback, which provides limited assistance and leads to a more time-consuming review cycle. If authors can identify some specific weaknesses in their paper, they can not only address the reviewer’s concerns but also improve their work. This raises the critical question of how to enhance authors’ comprehension of review comments. In this paper, we present SEAGraph a novel framework developed to clarify review comments by uncovering the underlying intentions behind them. We construct two types of graphs for each paper: the semantic mind graph, which captures the author’s thought process, and the hierarchical background graph, which delineates the research domains related to the paper. A retrieval method is then designed to extract relevant content from both graphs, facilitating coherent explanations for the review comments. Extensive experiments show that SEAGraph excels in review comment understanding tasks, offering significant benefits to authors. By bridging the gap between reviewers’ critiques and authors’ comprehension, SEAGraph contributes to a more efficient, transparent, and collaborative scientific publishing ecosystem. Our code is available at https://anonymous.4open.science/r/seagraph/.

ComRAG: Retrieval-Augmented Generation with Dynamic Vector Stores for Real-time Community Question Answering in Industry
Qinwen Chen | Wenbiao Tao | Zhiwei Zhu | Mingfan Xi | Liangzhong Guo | Yuan Wang | Wei Wang | Yunshi Lan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)

Community Question Answering (CQA) platforms can be deemed as important knowledge bases in community, but effectively leveraging historical interactions and domain knowledge in real-time remains a challenge. Existing methods often underutilize external knowledge, fail to incorporate dynamic historical QA context, or lack memory mechanisms suited for industrial deployment. We propose ComRAG, a retrieval-augmented generation framework for real-time industrial CQA that integrates static knowledge with dynamic historical QA pairs via a centroid-based memory mechanism designed for retrieval, generation, and efficient storage. Evaluated on three industrial CQA datasets, ComRAG consistently outperforms all baselines—achieving up to 25.9% improvement in vector similarity, reducing latency by 8.7%–23.3%, and lowering chunk growth from 20.23% to 2.06% over iterations.

2024

Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting
Siyi Liu | Yang Li | Jiang Li | Shan Yang | Yunshi Lan
Findings of the Association for Computational Linguistics: EMNLP 2024

Recent research in zero-shot Relation Extraction (RE) has focused on using Large Language Models (LLMs) due to their impressive zero-shot capabilities. However, current methods often perform suboptimally, mainly due to a lack of detailed, context-specific prompts needed for understanding various sentences and relations. To address this, we introduce the Self-Prompting framework, a novel method designed to fully harness the embedded RE knowledge within LLMs. Specifically, our framework employs a three-stage diversity approach to prompt LLMs, generating multiple synthetic samples that encapsulate specific relations from scratch. These generated samples act as in-context learning samples, offering explicit and context-specific guidance to efficiently prompt LLMs for RE. Experimental evaluations on benchmark datasets show our approach outperforms existing LLM-based zero-shot RE methods. Additionally, our experiments confirm the effectiveness of our generation pipeline in producing high-quality synthetic data that enhances performance.

An LLM-Enhanced Adversarial Editing System for Lexical Simplification
Keren Tan | Kangyang Luo | Yunshi Lan | Zheng Yuan | Jinlong Shu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Lexical Simplification (LS) aims to simplify text at the lexical level. Existing methods rely heavily on annotated data, making it challenging to apply in low-resource scenarios. In this paper, we propose a novel LS method without parallel corpora. This method employs an Adversarial Editing System with guidance from a confusion loss and an invariance loss to predict lexical edits in the original sentences. Meanwhile, we introduce an innovative LLM-enhanced loss to enable the distillation of knowledge from Large Language Models (LLMs) into a small-size LS system. From that, complex words within sentences are masked and a Difficulty-aware Filling module is crafted to replace masked positions with simpler words. At last, extensive experimental results and analyses on three benchmark LS datasets demonstrate the effectiveness of our proposed method.

Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis
Jianxiang Yu | Zichen Ding | Jiaqi Tan | Kangyang Luo | Zhenmin Weng | Chenghua Gong | Long Zeng | RenJing Cui | Chengcheng Han | Qiushi Sun | Zhiyong Wu | Yunshi Lan | Xiang Li
Findings of the Association for Computational Linguistics: EMNLP 2024

In recent years, the rapid increase in scientific papers has overwhelmed traditional review mechanisms, resulting in varying quality of publications. Although existing methods have explored the capabilities of Large Language Models (LLMs) for automated scientific reviewing, their generated contents are often generic or partial. To address the issues above, we introduce an automated paper reviewing framework SEA. It comprises of three modules: Standardization, Evaluation, and Analysis, which are represented by models SEA-S, SEA-E, and SEA-A, respectively. Initially, SEA-S distills data standardization capabilities of GPT-4 for integrating multiple reviews for a paper. Then, SEA-E utilizes standardized data for fine-tuning, enabling it to generate constructive reviews. Finally, SEA-A introduces a new evaluation metric called mismatch score to assess the consistency between paper contents and reviews. Moreover, we design a self-correction strategy to enhance the consistency. Extensive experimental results on datasets collected from eight venues show that SEA can generate valuable insights for authors to improve their papers.

2023

History Semantic Graph Enhanced Conversational KBQA with Temporal Information Modeling
Hao Sun | Yang Li | Liwei Deng | Bowen Li | Binyuan Hui | Binhua Li | Yunshi Lan | Yan Zhang | Yongbin Li
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Context information modeling is an important task in conversational KBQA. However, existing methods usually assume the independence of utterances and model them in isolation. In this paper, we propose a History Semantic Graph Enhanced KBQA model (HSGE) that is able to effectively model long-range semantic dependencies in conversation history while maintaining low computational cost. The framework incorporates a context-aware encoder, which employs a dynamic memory decay mechanism and models context at different levels of granularity. We evaluate HSGE on a widely used benchmark dataset for complex sequential question answering. Experimental results demonstrate that it outperforms existing baselines averaged on all question types.

R³ Prompting: Review, Rephrase and Resolve for Chain-of-Thought Reasoning in Large Language Models under Noisy Context
Qingyuan Tian | Hanlun Zhu | Lei Wang | Yang Li | Yunshi Lan
Findings of the Association for Computational Linguistics: EMNLP 2023

With the help of Chain-of-Thought (CoT) prompting, Large Language Models (LLMs) have achieved remarkable performance on various reasoning tasks. However, most of them have been evaluated under noise-free context and the dilemma for LLMs to produce inaccurate results under the noisy context has not been fully investigated. Existing studies utilize trigger sentences to encourage LLMs to concentrate on the relevant information but the trigger has limited effect on final answer prediction. Inspired by interactive CoT method, where intermediate reasoning steps are promoted by multiple rounds of interaction between users and LLMs, we propose a novel prompting method, namely R³ prompting, for CoT reasoning under noisy context. Specifically, R³ prompting interacts with LLMs to perform key sentence extraction, variable declaration and answer prediction, which corresponds to a thought process of reviewing, rephrasing and resolving. The responses generated at the last interaction will perform as hints to guide toward the responses of the next interaction. Our experiments show that R³ prompting significantly outperforms existing CoT prompting methods on five reasoning tasks under noisy context. With GPT-3.5-turbo, we observe 3.7% accuracy improvement on average on the reasoning tasks under noisy context compared to the most competitive prompting baseline. More analyses and ablation studies show the robustness and generalization of R³ prompting method in solving reasoning tasks in LLMs under noisy context.

Improving Cascade Decoding with Syntax-aware Aggregator and Contrastive Learning for Event Extraction
Zeyu Sheng | Yuanyuan Liang | Yunshi Lan
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“Cascade decoding framework has shown superior performance on event extraction tasks. How-ever, it treats a sentence as a sequence and neglects the potential benefits of the syntactic struc-ture of sentences. In this paper, we improve cascade decoding with a novel module and a self-supervised task. Specifically, we propose a syntax-aware aggregator module to model the syntaxof a sentence based on cascade decoding framework such that it captures event dependencies aswell as syntactic information. Moreover, we design a type discrimination task to learn better syn-tactic representations of different event types, which could further boost the performance of eventextraction. Experimental results on two widely used event extraction datasets demonstrate thatour method could improve the original cascade decoding framework by up to 2.2% percentagepoints of F1 score and outperform a number of competitive baseline methods. Introduction”

Structure-Discourse Hierarchical Graph for Conditional Question Answering on Long Documents
Haowei Du | Yansong Feng | Chen Li | Yang Li | Yunshi Lan | Dongyan Zhao
Findings of the Association for Computational Linguistics: ACL 2023

Conditional question answering on long documents aims to find probable answers and identify conditions that need to be satisfied to make the answers correct over long documents. Existing approaches solve this task by segmenting long documents into multiple sections, and attending information at global and local tokens to predict the answers and corresponding conditions. However, the natural structure of the document and discourse relations between sentences in each document section are ignored, which are crucial for condition retrieving across sections, as well as logical interaction over the question and conditions. To address this issue, this paper constructs a Structure-Discourse Hierarchical Graph (SDHG) and conducts bottom-up information propagation. Firstly we build the sentence-level discourse graphs for each section and encode the discourse relations by graph attention. Secondly, we construct a section-level structure graph based on natural structures, and conduct interactions over the question and contexts. Finally different levels of representations are integrated into jointly answer and condition decoding. The experiments on the benchmark ConditionalQA shows our approach gains over the prior state-of-the-art, by 3.0 EM score and 2.4 F1 score on answer measuring, as well as 2.2 EM score and 1.9 F1 score on jointly answer and condition measuring.

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
Lei Wang | Wanyu Xu | Yihuai Lan | Zhiqiang Hu | Yunshi Lan | Roy Ka-Wei Lee | Ee-Peng Lim
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large language models (LLMs) have recently been shown to deliver impressive performance in various NLP tasks. To tackle multi-step reasoning tasks, Few-shot chain-of-thought (CoT) prompting includes a few manually crafted step-by-step reasoning demonstrations which enable LLMs to explicitly generate reasoning steps and improve their reasoning task accuracy. To eliminate the manual efforts, Zero-shot-CoT concatenates the target problem statement with “Let’s think step by step” as an input prompt to LLMs. Despite the success of Zero-shot-CoT, it still suffers from three pitfalls: calculation errors, missing-step errors, and semantic misunderstanding errors. To address the missing-step errors, we propose Plan-and-Solve (PS) Prompting. It consists of two components: first, devising a plan to divide the entire task into smaller subtasks, and then carrying out the subtasks according to the plan. To address the calculation errors and improve the quality of generated reasoning steps, we extend PS prompting with more detailed instructions and derive PS+ prompting. We evaluate our proposed prompting strategy on ten datasets across three reasoning problems. The experimental results over GPT-3 show that our proposed zero-shot prompting consistently outperforms Zero-shot-CoT across all datasets by a large margin, is comparable to or exceeds Zero-shot-Program-of-Thought Prompting, and has comparable performance with 8-shot CoT prompting on the math reasoning problem. The code can be found at https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting.

Prompting Large Language Models with Chain-of-Thought for Few-Shot Knowledge Base Question Generation
Yuanyuan Liang | Jianing Wang | Hanlun Zhu | Lei Wang | Weining Qian | Yunshi Lan
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

The task of Question Generation over Knowledge Bases (KBQG) aims to convert a logical form into a natural language question. For the sake of expensive cost of large-scale question annotation, the methods of KBQG under low-resource scenarios urgently need to be developed. However, current methods heavily rely on annotated data for fine-tuning, which is not well-suited for few-shot question generation. The emergence of Large Language Models (LLMs) has shown their impressive generalization ability in few-shot tasks. Inspired by Chain-of-Thought (CoT) prompting, which is an in-context learning strategy for reasoning, we formulate KBQG task as a reasoning problem, where the generation of a complete question is splitted into a series of sub-question generation. Our proposed prompting method KQG-CoT first retrieves supportive logical forms from the unlabeled data pool taking account of the characteristics of the logical form. Then, we write a prompt to explicit the reasoning chain of generating complicated questions based on the selected demonstrations. To further ensure prompt quality, we extend KQG-CoT into KQG-CoT+ via sorting the logical forms by their complexity. We conduct extensive experiments over three public KBQG datasets. The results demonstrate that our prompting method consistently outperforms other prompting baselines on the evaluated datasets. Remarkably, our KQG-CoT+ method could surpass existing few-shot SoTA results of the PathQuestions dataset by 18.25, 10.72, and 10.18 absolute points on BLEU-4, METEOR, and ROUGE-L, respectively.

2022

MWP-BERT: Numeracy-Augmented Pre-training for Math Word Problem Solving
Zhenwen Liang | Jipeng Zhang | Lei Wang | Wei Qin | Yunshi Lan | Jie Shao | Xiangliang Zhang
Findings of the Association for Computational Linguistics: NAACL 2022

Math word problem (MWP) solving faces a dilemma in number representation learning. In order to avoid the number representation issue and reduce the search space of feasible solutions, existing works striving for MWP solving usually replace real numbers with symbolic placeholders to focus on logic reasoning. However, different from common symbolic reasoning tasks like program synthesis and knowledge graph reasoning, MWP solving has extra requirements in numerical reasoning. In other words, instead of the number value itself, it is the reusable numerical property that matters more in numerical reasoning. Therefore, we argue that injecting numerical properties into symbolic placeholders with contextualized representation learning schema can provide a way out of the dilemma in the number representation issue here. In this work, we introduce this idea to the popular pre-training language model (PLM) techniques and build MWP-BERT, an effective contextual number representation PLM. We demonstrate the effectiveness of our MWP-BERT on MWP solving and several MWP-specific understanding tasks on both English and Chinese benchmarks.

2021

Modeling Transitions of Focal Entities for Conversational Knowledge Base Question Answering
Yunshi Lan | Jing Jiang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Conversational KBQA is about answering a sequence of questions related to a KB. Follow-up questions in conversational KBQA often have missing information referring to entities from the conversation history. In this paper, we propose to model these implied entities, which we refer to as the focal entities of the conversation. We propose a novel graph-based model to capture the transitions of focal entities and apply a graph neural network to derive a probability distribution of focal entities for each question, which is then combined with a standard KBQA module to perform answer ranking. Our experiments on two datasets demonstrate the effectiveness of our proposed method.

2020

Query Graph Generation for Answering Multi-hop Complex Questions from Knowledge Bases
Yunshi Lan | Jing Jiang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Previous work on answering complex questions from knowledge bases usually separately addresses two types of complexity: questions with constraints and questions with multiple hops of relations. In this paper, we handle both types of complexity at the same time. Motivated by the observation that early incorporation of constraints into query graphs can more effectively prune the search space, we propose a modified staged query graph generation method with more flexible ways to generate query graphs. Our experiments clearly show that our method achieves the state of the art on three benchmark KBQA datasets.

2018

Embedding WordNet Knowledge for Textual Entailment
Yunshi Lan | Jing Jiang
Proceedings of the 27th International Conference on Computational Linguistics

In this paper, we study how we can improve a deep learning approach to textual entailment by incorporating lexical entailment relations from WordNet. Our idea is to embed the lexical entailment knowledge contained in WordNet in specially-learned word vectors, which we call “entailment vectors.” We present a standard neural network model and a novel set-theoretic model to learn these entailment vectors from word pairs with known lexical entailment relations derived from WordNet. We further incorporate these entailment vectors into a decomposable attention model for textual entailment and evaluate the model on the SICK and the SNLI dataset. We find that using these special entailment word vectors, we can significantly improve the performance of textual entailment compared with a baseline that uses only standard word2vec vectors. The final performance of our model is close to or above the state of the art, but our method does not rely on any manually-crafted rules or extensive syntactic features.

Co-authors

Yuanyuan Liang 2

Chenghua Gong 1

Liangzhong Guo 1

Chengcheng Han 1

Roy Ka-Wei Lee 1

Zhenwen Liang 1

Qingyuan Tian 1

Xiaoxiao Zhang 1

Xiangliang Zhang 1

Venues