Ziqiang Cao - ACL Anthology

Ziqiang Cao

2026

Interleaved Tool-Call Reasoning for Protein Function Understanding
Chuanliu Fan | Zicheng Ma | Huanran Meng | Aijia Zhang | Wenjie Du | Jun Zhang | Ziqiang Cao | Guohong Fu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent advances in large language models (LLMs) have highlighted the effectiveness of chain-of-thought reasoning in symbolic domains such as mathematics and programming. However, our study shows that directly transferring such text-based reasoning paradigms to protein function understanding is ineffective: reinforcement learning mainly amplifies superficial keyword patterns while failing to introduce new biological knowledge, resulting in limited generalization. We argue that protein function prediction is a knowledge-intensive scientific task that fundamentally relies on external biological priors and computational tools rather than purely internal reasoning. To address this gap, we propose Protein Function Understanding Agent (PFUA), a tool-augmented protein reasoning agent that unifies problem decomposition, tool invocation, and grounded answer generation. Instead of relying on long unconstrained reasoning traces, PFUA integrates domain-specific tools to produce verifiable intermediate evidence. Experiments on four benchmarks demonstrate that PFUA consistently outperforms text-only reasoning models with an average performance improvement of 103%. We believe PFUA has the potential to become a standard paradigm for agentic reasoning in knowledge-intensive life science domains.

The Bidirectional Process Reward Model
Lingyin Zhang | Jun Gao | Xiaoxue Ren | Ziqiang Cao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Process Reward Models (PRMs), which assign fine-grained scores to intermediate reasoning steps within a solution trajectory, have emerged as a promising approach to enhance the reasoning quality of Large Language Models (LLMs).However, most existing PRMs rely on a unidirectional left-to-right (L2R) evaluation scheme, which restricts their utilization of global context.In light of this challenge, we propose a novel bidirectional evaluation paradigm, named Bidirectional 𝐏rocess 𝐑eward 𝐌odel (BiPRM).BiPRM incorporates a parallel right-to-left (R2L) evaluation stream, implemented via prompt reversal, alongside the conventional L2R flow.Then a gating mechanism is introduced to adaptively fuse the reward scores from both streams to yield a holistic quality assessment.Remarkably, compared to the original PRM, BiPRM introduces only a 0.3% parameter increase for the gating module, and the parallel execution of two streams incurs merely 5% inference time latency. Our extensive empirical evaluations spanning diverse benchmarks, LLM backbones, PRM objectives and sampling policies demonstrate that BiPRM consistently surpasses unidirectional baselines, achieving an average relative gain of 10.6% over 54 solution-level configurations and 37.7% in 12 step-level error detection scenarios.Generally, our results highlight the effectiveness, robustness and general applicability of BiPRM, offering a promising new direction for process-based reward modeling.

2025

Personalized Large Language Model Assistant with Evolving Conditional Memory
Ruifeng Yuan | Shichao Sun | Yongqi Li | Zili Wang | Ziqiang Cao | Wenjie Li
Proceedings of the 31st International Conference on Computational Linguistics

With the rapid development of large language models, AI assistants like ChatGPT have become increasingly integrated into people’s works and lives but are limited in personalized services. In this paper, we present a plug-and-play framework that could facilitate personalized large language model assistants with evolving conditional memory. The personalized assistant focuses on intelligently preserving the knowledge and experience from the history dialogue with the user, which can be applied to future tailored responses that better align with the user’s preferences. Generally, the assistant generates a set of records from the dialogue, stores them in a memory bank, and retrieves related memory to improve the quality of the response. For the crucial memory design, we explore different ways of constructing the memory and propose a new memorizing mechanism named conditional memory to enhance the memory management of the framework. We also investigate the retrieval and usage of memory in the generation process. To better evaluate the personalized assistants’ abilities, we build the first evaluation benchmark from three critical aspects: continuing previous dialogue, learning personalized knowledge and learning from user feedback. The experimental results illustrate the effectiveness of our method.

UniICL: An Efficient ICL Framework Unifying Compression, Selection, and Generation
Jun Gao | Qi Lv | Zili Wang | Tianxiang Wu | Ziqiang Cao | Wenjie Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In-context learning (ICL) enhances the reasoning abilities of Large Language Models (LLMs) by prepending a few demonstrations. It motivates researchers to introduce more examples to provide additional contextual information for the generation. However, existing methods show a significant limitation due to the problem of excessive growth in context length which causes a large hardware burden. Additionally, shallow-relevant examples selected by out-off-shelf tools hinder LLMs from capturing useful contextual information for generation. In this paper, to approach these limitations, we propose UniICL, a novel Unified ICL framework that unifies demonstration compression, demonstration selection, and final response generation. Furthermore, to avoid repeated compression of the same demonstration and boost inference efficiency, we design a tailored compression strategy that allows UniICL caching compression results into Demonstration Bank(DB). Extensive out-of-domain evaluations prove the advantages of UniICL in both effectiveness and efficiency.

2024

CoUDA: Coherence Evaluation via Unified Data Augmentation
Dawei Zhu | Wenhao Wu | Yifan Song | Fangwei Zhu | Ziqiang Cao | Sujian Li
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Coherence evaluation aims to assess the organization and structure of a discourse, which remains challenging even in the era of large language models. Due to the scarcity of annotated data, data augmentation is commonly used for training coherence evaluation models. However, previous augmentations for this task primarily rely on heuristic rules, lacking designing criteria as guidance.In this paper, we take inspiration from linguistic theory of discourse structure, and propose a data augmentation framework named CoUDA. CoUDA breaks down discourse coherence into global and local aspects, and designs augmentation strategies for both aspects, respectively.Especially for local coherence, we propose a novel generative strategy for constructing augmentation samples, which involves post-pretraining a generative model and applying two controlling mechanisms to control the difficulty of generated samples. During inference, CoUDA also jointly evaluates both global and local aspects to comprehensively assess the overall coherence of a discourse.Extensive experiments in coherence evaluation show that, with only 233M parameters, CoUDA achieves state-of-the-art performance in both pointwise scoring and pairwise ranking tasks, even surpassing recent GPT-3.5 and GPT-4 based metrics.

Improving Copy-oriented Text Generation via EDU Copy Mechanism
Tianxiang Wu | Han Chen | Luozheng Qin | Ziqiang Cao | Chunhui Ai
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Many text generation tasks are copy-oriented. For instance, nearly 30% content of news summaries is copied. The copy rate is even higher in Grammatical Error Correction (GEC). However, existing generative models generate texts through word-by-word decoding, which may lead to factual inconsistencies and slow inference. While Elementary Discourse Units (EDUs) are outstanding extraction units, EDU-based extractive methods can alleviate the aforementioned problems. As a consequence, we propose EDUCopy, a framework that integrates the behavior of copying EDUs into generative models. The main idea of EDUCopy is to use special index tags to represent the copied EDUs during generation. Specifically, we extract important EDUs from input sequences, finetune generative models to generate sequences with special index tags, and restore the generated special index tags into corresponding text spans. By doing so, EDUCopy reduces the number of generated tokens significantly. To verify the effectiveness of EDUCopy, we conduct experiments on the news summarization datasets CNNDM, NYT and the GEC datasets FCE, WI-LOCNESS. While achieving notable ROUGE and M2 scores, GPT-4 evaluation validates the strength of our models in terms of factual consistency, fluency, and overall performance. Moreover, compared to baseline models, EDUCopy achieves a significant acceleration of 1.65x.

Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization
Shichao Sun | Ruifeng Yuan | Ziqiang Cao | Wenjie Li | Pengfei Liu
Findings of the Association for Computational Linguistics: ACL 2024

2023

KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained Language Model
Lei Geng | Xu Yan | Ziqiang Cao | Juntao Li | Wenjie Li | Sujian Li | Xinjie Zhou | Yang Yang | Jun Zhang
Findings of the Association for Computational Linguistics: EMNLP 2023

Most biomedical pretrained language models are monolingual and cannot handle the growing cross-lingual requirements. The scarcity of non-English domain corpora, not to mention parallel data, poses a significant hurdle in training multilingual biomedical models. Since knowledge forms the core of domain-specific corpora and can be translated into various languages accurately, we propose a model called KBioXLM, which transforms the multilingual pretrained model XLM-R into the biomedical domain using a knowledge-anchored approach. We achieve a biomedical multilingual corpus by incorporating three granularity knowledge alignments (entity, fact, and passage levels) into monolingual corpora. Then we design three corresponding training tasks (entity masking, relation masking, and passage relation prediction) and continue training on top of the XLM-R model to enhance its domain cross-lingual ability. To validate the effectiveness of our model, we translate the English benchmarks of multiple tasks into Chinese. Experimental results demonstrate that our model significantly outperforms monolingual and multilingual pretrained models in cross-lingual zero-shot and few-shot scenarios, achieving improvements of up to 10+ points.

Diffusion Language Model with Query-Document Relevance for Query-Focused Summarization
Shaoyao Huang | Luozheng Qin | Ziqiang Cao
Findings of the Association for Computational Linguistics: EMNLP 2023

Query-Focused Summarization (QFS) aims to generate summaries from source documents that can answer specific queries. Although the QFS task has gained increasing attention recently, its development is constrained by the fact that mainstream QFS models are BART variants, which are autoregressive and suffer from long-term dependencies and exposure bias. To address these problems, we adopt a diffusion language model that performs well in non-autoregressive scenarios to effectively resolve issues related to autoregressive methods. However, QFS requires guidance from queries to generate adequate summaries, while diffusion language models have limited sensitivity to queries. In this paper, we propose QFS-DLM, a non-autoregressive diffusion language model that incorporates query-document fragment relevance and query-document global relevance to enhance the adaptability of QFS tasks. Firstly, we extract key fragments from documents based on queries and assign higher weights to them, thereby emphasizing crucial and continuous information within the document. Secondly, we calculate global relevance scores between queries and documents, and then integrate these scores into the model’s loss function, enabling the model to prefer high-quality data and distance itself from low-quality data. Overall, our method achieves state-of-the-art performance on Debatepedia and PubMedQA datasets in ROUGE scores, GPT-4, and human evaluations.

Data Selection Curriculum for Abstractive Text Summarization
Shichao Sun | Ruifeng Yuan | Jianfei He | Ziqiang Cao | Wenjie Li | Xiaohua Jia
Findings of the Association for Computational Linguistics: EMNLP 2023

Abstractive Text Summarization (ATS) models are commonly trained using large-scale data that is randomly shuffled. However, the impact of data selection and data ordering on ATS models remains a relatively unexplored research area, where a significant challenge lies in accurately assessing the learning difficulty of each training instance. This study introduces a Data Selection Curriculum (DSC) scoring system that incorporates both the difficulty of improving ATS model via an instance and the expected performance on this instance. By selectively excluding excessively simple and overly complex instances, the training efficiency can be optimized. Furthermore, curriculum learning is integrated to accelerate convergence and improve performance by gradually increasing the learning difficulty, inspired by human learners. Experimental results on the CNN/DailyMail dataset demonstrate that our approach surpasses potent baselines, utilizing a mere 20% of the available instances.

Can Diffusion Model Achieve Better Performance in Text Generation ? Bridging the Gap between Training and Inference !
Zecheng Tang | Pinzheng Wang | Keyan Zhou | Juntao Li | Ziqiang Cao | Min Zhang
Findings of the Association for Computational Linguistics: ACL 2023

Diffusion models have been successfully adapted to text generation tasks by mapping the discrete text into the continuous space. However, there exist nonnegligible gaps between training and inference, owing to the absence of the forward process during inference. Thus, the model only predicts based on the previously generated reverse noise rather than the noise computed by the forward process. Besides, the widely-used downsampling strategy in speeding up the inference will cause the mismatch of diffusion trajectories between training and inference. To understand and mitigate the above two types of training-inference discrepancies, we launch a thorough preliminary study. Based on our observations, we propose two simple yet effective methods to bridge the gaps mentioned above, named Distance Penalty and Adaptive Decay Sampling. Extensive experiments on 6 generation tasks confirm the superiority of our methods, which can achieve 100× → 200× speedup with better performance. Our code will be released at https://github.com/CODINNLG/Bridge_Gap_Diffusion.

Separating Context and Pattern: Learning Disentangled Sentence Representations for Low-Resource Extractive Summarization
Ruifeng Yuan | Shichao Sun | Zili Wang | Ziqiang Cao | Wenjie Li
Findings of the Association for Computational Linguistics: ACL 2023

Extractive summarization aims to select a set of salient sentences from the source document to form a summary. Context information has been considered one of the key factors for this task. Meanwhile, there also exist other pattern factors that can identify sentence importance, such as sentence position or certain n-gram tokens. However, such pattern information is only effective in specific datasets or domains and can not be generalized like the context information when there only exists limited data. In this case, current extractive summarization models may suffer from a performance drop when transferring to a new dataset. In this paper, we attempt to apply disentangled representation learning on extractive summarization, and separate the two key factors for the task, context and pattern, for a better generalization ability in the low-resource setting. To achieve this, we propose two groups of losses for encoding and disentangling sentence representations into context representations and pattern representations. In this case, we can either use only the context information in the zero-shot setting or fine-tune the pattern information in the few-shot setting. Experimental results on three summarization datasets from different domains show the effectiveness of our proposed approach.

Dynamic and Efficient Inference for Text Generation via BERT Family
Xiaobo Liang | Juntao Li | Lijun Wu | Ziqiang Cao | Min Zhang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite the excellent performance of Pre-trained Language Models on many text generation tasks, they suffer from inefficient inference on computation and memory due to their large-scale parameters and the universal autoregressive decoding paradigm. In this work, we propose a novel fine-tuning method DEER, which can make a single pre-trained model support Dynamic and Efficient infERence and achieve an adaptive trade-off between model performance and latency. In particular, our critical insight is to jointly utilize the non-autoregressive (NAR) generation and dynamic parameter pruning techniques, which can flexibly control the decoding iteration steps and model sizes according to memory and latency limitations. Besides, we also explore the effectiveness of the pre-trained MLMs (i.e., the BERT family) for text generation tasks since their bidirectional attention nature is more suitable for the NAR training objective. Extensive experiments on both monolingual and multilingual pre-trained MLMs demonstrate the effectiveness of our proposed DEER method by consistently achieving (1) higher BLEU scores than the strong autoregressive Transformer model on three neural machine translation tasks with 3 → 12 times speedup, (2) competitive performance (but with much faster inference speed) compared with the BART model on four GLGE benchmark tasks. Our code will be publicly available at GitHub https://github.com/dropreg/DEER.

2022

FRSUM: Towards Faithful Abstractive Summarization via Enhancing Factual Robustness
Wenhao Wu | Wei Li | Jiachen Liu | Xinyan Xiao | Ziqiang Cao | Sujian Li | Hua Wu
Findings of the Association for Computational Linguistics: EMNLP 2022

Despite being able to generate fluent and grammatical text, current Seq2Seq summarization models still suffering from the unfaithful generation problem.In this paper, we study the faithfulness of existing systems from a new perspective of factual robustness which is the ability to correctly generate factual information over adversarial unfaithful information.We first measure a model’sfactual robustness by its success rate to defend against adversarial attacks when generating factual information.The factual robustness analysis on a wide range of current systems shows its good consistency with human judgments on faithfulness.Inspired by these findings, we propose to improve the faithfulness of a model by enhancing its factual robustness.Specifically, we propose a novel training strategy, namely FRSUM, which teaches the model to defend against both explicit adversarial samples and implicit factual adversarial perturbations.Extensive automatic and human evaluation results show that FRSUM consistently improves the faithfulness of various Seq2Seq models, such as T5, BART.

Few-shot Query-Focused Summarization with Prefix-Merging
Ruifeng Yuan | Zili Wang | Ziqiang Cao | Wenjie Li
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Query-focused summarization has been considered as an important extension for text summarization. It aims to generate a concise highlight for a given query. Different from text summarization, query-focused summarization has long been plagued by the problem of lacking high-quality large-scale datasets. In this paper, we investigate the idea that whether we can integrate and transfer the knowledge of text summarization and question answering to assist the few-shot learning in query-focused summarization. Here, we propose prefix-merging, a prefix-based pretraining strategy for few-shot learning in query-focused summarization. Drawn inspiration from prefix-tuning, we are allowed to integrate the task knowledge from text summarization and question answering into a properly designed prefix and apply the merged prefix to query-focused summarization. With only a small amount of trainable parameters, prefix-merging outperforms fine-tuning on query-focused summarization. We further discuss the influence of different prefix designs and propose a visualized explanation for how prefix-merging works.

2021

BASS: Boosting Abstractive Summarization with Unified Semantic Graph
Wenhao Wu | Wei Li | Xinyan Xiao | Jiachen Liu | Ziqiang Cao | Sujian Li | Hua Wu | Haifeng Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Abstractive summarization for long-document or multi-document remains challenging for the Seq2Seq architecture, as Seq2Seq is not good at analyzing long-distance relations in text. In this paper, we present BASS, a novel framework for Boosting Abstractive Summarization based on a unified Semantic graph, which aggregates co-referent phrases distributing across a long range of context and conveys rich relations between phrases. Further, a graph-based encoder-decoder model is proposed to improve both the document representation and summary generation process by leveraging the graph structure. Specifically, several graph augmentation methods are designed to encode both the explicit and implicit relations in the text while the graph-propagation attention mechanism is developed in the decoder to select salient content into the summary. Empirical results show that the proposed architecture brings substantial improvements for both long-document and multi-document summarization tasks.

2018

Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization
Ziqiang Cao | Wenjie Li | Sujian Li | Furu Wei
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Most previous seq2seq summarization systems purely depend on the source text to generate summaries, which tends to work unstably. Inspired by the traditional template-based summarization approaches, this paper proposes to use existing summaries as soft templates to guide the seq2seq model. To this end, we use a popular IR platform to Retrieve proper summaries as candidate templates. Then, we extend the seq2seq framework to jointly conduct template Reranking and template-aware summary generation (Rewriting). Experiments show that, in terms of informativeness, our model significantly outperforms the state-of-the-art methods, and even soft templates themselves demonstrate high competitiveness. In addition, the import of high-quality external summaries improves the stability and readability of generated summaries.

2017

DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset
Yanran Li | Hui Su | Xiaoyu Shen | Wenjie Li | Ziqiang Cao | Shuzi Niu
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We develop a high-quality multi-turn dialog dataset, DailyDialog, which is intriguing in several aspects. The language is human-written and less noisy. The dialogues in the dataset reflect our daily communication way and cover various topics about our daily life. We also manually label the developed dataset with communication intention and emotion information. Then, we evaluate existing approaches on DailyDialog dataset and hope it benefit the research field of dialog systems. The dataset is available on http://yanran.li/dailydialog

2016

PolyU at CL-SciSumm 2016
Ziqiang Cao | Wenjie Li | Dapeng Wu
Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL)

AttSum: Joint Learning of Focusing and Summarization with Neural Attention
Ziqiang Cao | Wenjie Li | Sujian Li | Furu Wei | Yanran Li
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Query relevance ranking and sentence saliency ranking are the two main tasks in extractive query-focused summarization. Previous supervised summarization systems often perform the two tasks in isolation. However, since reference summaries are the trade-off between relevance and saliency, using them as supervision, neither of the two rankers could be trained well. This paper proposes a novel summarization system called AttSum, which tackles the two tasks jointly. It automatically learns distributed representations for sentences as well as the document cluster. Meanwhile, it applies the attention mechanism to simulate the attentive reading of human behavior when a query is given. Extensive experiments are conducted on DUC query-focused summarization benchmark datasets. Without using any hand-crafted features, AttSum achieves competitive performance. We also observe that the sentences recognized to focus on the query indeed meet the query need.

2015

Learning Summary Prior Representation for Extractive Summarization
Ziqiang Cao | Furu Wei | Sujian Li | Wenjie Li | Ming Zhou | Houfeng Wang
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

Text-level Discourse Dependency Parsing
Sujian Li | Liang Wang | Ziqiang Cao | Wenjie Li
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Joint Learning of Chinese Words, Terms and Keywords
Ziqiang Cao | Sujian Li | Heng Ji
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Co-authors

Hua Wu (吴华) 2

Shaoyao Huang 1

Zecheng Tang (汤泽成) 1

Pinzheng Wang 1

Lingyin Zhang 1

Venues