Yi Cai - ACL Anthology

Yi Cai

2025

CADReview: Automatically Reviewing CAD Programs with Error Detection and Correction
Jiali Chen | Xusen Hei | HongFei Liu | Yuancheng Wei | Zikun Deng | Jiayuan Xie | Yi Cai | Li Qing
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Computer-aided design (CAD) is crucial in prototyping 3D objects through geometric instructions (i.e., CAD programs). In practical design workflows, designers often engage in time-consuming reviews and refinements of these prototypes by comparing them with reference images. To bridge this gap, we introduce the CAD review task to automatically detect and correct potential errors, ensuring consistency between the constructed 3D objects and reference images. However, recent advanced multimodal large language models (MLLMs) struggle to recognize multiple geometric components and perform spatial geometric operations within the CAD program, leading to inaccurate reviews. In this paper, we propose the CAD program repairer (ReCAD) framework to effectively detect program errors and provide helpful feedback on error correction. Additionally, we create a dataset, CADReview, consisting of over 20K program-image pairs, with diverse errors for the CAD review task. Extensive experiments demonstrate that our ReCAD significantly outperforms existing MLLMs, which shows great potential in design applications.

Walk in Others’ Shoes with a Single Glance: Human-Centric Visual Grounding with Top-View Perspective Transformation
Yuqi Bu | Xin Wu | Zirui Zhao | Yi Cai | David Hsu | Qiong Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Visual perspective-taking, an ability to envision others’ perspectives from a single self-perspective, is vital in human-robot interactions. Thus, we introduce a human-centric visual grounding task and a dataset to evaluate this ability. Recent advances in vision-language models (VLMs) have shown potential for inferring others’ perspectives, yet are insensitive to information differences induced by slight perspective changes. To address this problem, we propose a top-view enhanced perspective transformation (TEP) method, which decomposes the transition from robot to human perspectives through an abstract top-view representation. It unifies perspectives and facilitates the capture of information differences from diverse perspectives. Experimental results show that TEP improves performance by up to 18%, exhibits perspective-taking abilities across various perspectives, and generalizes effectively to robotic and dynamic scenarios.

Rethinking-based Code Summarization with Chain of Comments
Liuwen Cao | Hongkui He | Hailin Huang | Jiexin Wang | Yi Cai
Proceedings of the 31st International Conference on Computational Linguistics

Automatic code summarization aims to generate concise natural language descriptions (summary) for source code, which can free software developers from the heavy burden of manual commenting and software maintenance. Existing methods focus on learning a direct mapping from pure code to summaries, overlooking the significant heterogeneity gap between code and summary. Moreover, existing methods lack a human-like re-check process to evaluate whether the generated summaries match well with the code. To address these two limitations, we introduce RBCoSum, a novel framework that incorporates the generated Chain Of Comments (COC) as auxiliary intermediate information for the model to bridge the gap between code and summaries. Also, we propose a rethinking process where a learned ranker trained on our constructed ranking dataset scores the extent of matching between the generated summary and the code, selecting the highest-scoring summary to achieve a re-check process. We conduct extensive experiments to evaluate our approach and compare it with other automatic code summarization models as well as multiple code Large Language Models (LLMs). The experimental results show that RBCoSum is effective and outperforms baselines by a large margin. The human evaluation also proves the summaries generated with RBCoSum are more natural, informative, useful, and truthful.

Fine-Grained Features-based Code Search for Precise Query-Code Matching
Xinting Zhang | Mengqiu Cheng | Mengzhen Wang | Songwen Gong | Jiayuan Xie | Yi Cai | Qing Li
Proceedings of the 31st International Conference on Computational Linguistics

Code search aims to quickly locate target code snippets from databases using natural language queries, which promotes code reusability. Existing methods can effectively obtain aligned token-level and query word-level features. However, these studies usually represent the semantics of code and query by averaging the features of each token and word respectively, which makes it difficult to accurately capture the code details that are closely related to the query. To address this issue, we propose a fine-grained code search model that consists of a cross-modal encoder, a mapping layer, and a classification layer. Specifically, we utilize a pre-trained model, GraphCodeBERT, in the cross-modal encoder to align features. In the mapping layer, we introduce a co-attention network to capture the fine-grained interactions between code and query, ensuring a model can precisely identify key code segments relevant to the query. Finally, in the classification layer, we incorporate instruction learning techniques that leverage contextual reasoning to improve the accuracy of query-code matching. Experimental results show that our proposed model significantly outperforms existing methods across multiple programming language datasets.

Where Confabulation Lives: Latent Feature Discovery in LLMs
Thibaud Ardoin | Yi Cai | Gerhard Wunder
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Hallucination remains a critical failure mode of large language models (LLMs), undermining their trustworthiness in real-world applications. In this work, we focus on confabulation, a foundational aspect of hallucination where the model fabricates facts about unknown entities. We introduce a targeted dataset designed to isolate and analyze this behavior across diverse prompt types. Using this dataset, and building on recent progress in interpreting LLM internals, we extract latent directions associated with confabulation using sparse projections. A simple vector-based steering method demonstrates that these directions can modulate model behavior with minimal disruption, shedding light on the inner representations that drive factual and non-factual output. Our findings contribute to a deeper mechanistic understanding of LLMs and pave the way toward more trustworthy and controllable generation. We release the code and dataset at https://github.com/Thibaud-Ardoin/where-confabulation-lives.

Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model
Jiali Chen | Xusen Hei | Yuqi Xue | Zihan Wu | Jiayuan Xie | Yi Cai
Findings of the Association for Computational Linguistics: NAACL 2025

Chinese literary classics hold significant cultural and educational value, offering deep insights into morality, history, and human nature. These works often include classical Chinese and complex narratives, making them difficult for children to read. To bridge this gap, we introduce a child-friendly literary adaptation (CLA) task to adapt the Chinese literary classic into engaging and accessible text for children. However, recent large language models (LLMs) overlook children’s reading preferences (i.e., vivid character portrayals, concise narrative structures, and appropriate readability with simpler words and sentences), which poses challenges in CLA. In this paper, we propose a method called InstructChild, which augments the LLM with these preferences for adaptation. Specifically, we first obtain the characters’ personalities and narrative structure as additional information for fine-grained instruction tuning. Then, we devise a readability metric as the reward to align the LLM with the children’s reading level. Finally, a lookahead decoding strategy is applied to improve the readability of the generated text during inference. To support the evaluation of CLA task, we construct the Classic4Children dataset, which comprises both the original and child-friendly versions of the Four Great Classical Novels of Chinese literature. Experimental results show that our InstructChild significantly improves performance in automatic and human evaluation.

RTADev: Intention Aligned Multi-Agent Framework for Software Development
Jie Liu | Guohua Wang | Ronghui Yang | Jiajie Zeng | Mengchen Zhao | Yi Cai
Findings of the Association for Computational Linguistics: ACL 2025

LLM-based Multi-agent frameworks have shown a great potential in solving real-world software development tasks, where the agents of different roles can communicate much more efficiently than humans. Despite their efficiency, LLM-based agents can hardly fully understand each other, which frequently causes errors during the development process. Moreover, the accumulation of errors could easily lead to the failure of the whole project. In order to reduce such errors, we introduce an intention aligned multi-agent framework RTADev, which utilizes a self-correction mechanism to ensure that all agents work based on a consensus. RTADev mimics human teams where individuals are free to start meetings anytime for reaching agreement. Specifically, RTADev integrates an alignment checking phase and a conditional ad hoc group review phase, so that the errors can be effectively reduced with minimum agent communications. Our experiments on various software development tasks show that RTADev significantly improves the quality of generated software code in terms of executability, structural and functional completeness. The code of our project is available at https://github.com/codeagent-rl/RTADev.

RuleEdit: Towards Rule-Level Knowledge Generalization to Mitigate Over-Editing in Large Language Models
Bihan Zhou | HaoPeng Ren | Li Yuan | Yi Cai | Liuwen Cao | Zikun Deng
Findings of the Association for Computational Linguistics: ACL 2025

Knowledge editing emerges as a promising approach for updating target knowledge in Large Language Models (LLMs) in a timely manner, thereby preventing undesirable behaviors stemming from outdated, inaccurate, or incomplete knowledge. However, existing methods mainly focus on instance-level editing, which is prone to over-editing risk featuring knowledge degradation and general ability deterioration, due to redundant instance-specific modifications for knowledge. To mitigate the over-editing risk, we explore the rule-level editing problem that avoids case-by-case modification by generalizing rule-level knowledge to update rule-derived instances. We further construct a benchmark called RuleEdit for systematic evaluation on rule-level editing. Moreover, we propose a Rule-Transfer Editing (RTE) method to facilitate effective updates and generalizations of rule-level knowledge in LLMs. Experimental results highlight our significant improvements, with the enhancements of 28.1% in portability and 8.1% in average performance over the best-performing baselines for LLaMA-2-7B on RULE_mix.

Sequence Structure Aware Retriever for Procedural Document Retrieval: A New Dataset and Baseline
Zhenqi Ye | HaoPeng Ren | Yi Cai | Qingbao Huang | Jing Qin | Pinli Zhu | Songwen Gong
Findings of the Association for Computational Linguistics: EMNLP 2025

Execution failures are common in daily life when individuals perform procedural tasks, such as cooking or handicrafts making. Retrieving relevant procedural documents that align closely with both the content of steps and the overall execution sequence can help correct these failures with fewer modifications. However, existing retrieval methods, which primarily focus on declarative knowledge, often neglect the execution sequence structures inherent in procedural documents. To tackle this challenge, we introduce a new dataset Procedural Questions, and propose a retrieval model Graph-Fusion Procedural Document Retriever (GFPDR) which integrates procedural graphs with document representations. Extensive experiments demonstrate the effectiveness of GFPDR, highlighting its superior performance in procedural document retrieval compared to existing models.

MRCD: Multi-disciplinary RAG-Enhanced Collaborative Debate for Medical Question Answering
Dayong Liang | Yi Cai | Zhiyuan Wen
Proceedings of the First Workshop of LLM Reasoning on Medicine: Challenges, Opportunities, and Future

2024

Updating Large Language Models’ Memories with Time Constraints
Xin Wu | Yuqi Bu | Yi Cai | Tao Wang
Findings of the Association for Computational Linguistics: EMNLP 2024

By incorporating the latest external knowledge, large language models (LLMs) can modify their internal memory. However, in practical applications, LLMs may encounter outdated information, necessitating the filtering of such data and updating of knowledge beyond internal memory. This paper explores whether LLMs can selectively update their memories based on the time constraints between internal memory and external knowledge. We evaluate existing LLMs using three types of data that exhibit different time constraints. Our experimental results reveal the challenges most LLMs face with time-constrained knowledge and highlight the differences in how various LLMs handle such information. Additionally, to address the difficulties LLMs encounter in understanding time constraints, we propose a two-stage decoupling framework that separates the identification and computation of time constraint into a symbolic system. Experimental results demonstrate that the proposed framework yields an improvement of over 60% in ChatGPT’s performance, and achieves a 12-24% enhancement in state-of-the-art LLM GPT-4.

Abstract-level Deductive Reasoning for Pre-trained Language Models
Xin Wu | Yi Cai | Ho-fung Leung
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Pre-trained Language Models have been shown to be able to emulate deductive reasoning in natural language. However, PLMs are easily affected by irrelevant information (e.g., entity) in instance-level proofs when learning deductive reasoning. To address this limitation, we propose an Abstract-level Deductive Reasoner (ADR). ADR is trained to predict the abstract reasoning proof of each sample, which guides PLMs to learn general reasoning patterns rather than instance-level knowledge. Experimental results demonstrate that ADR significantly reduces the impact of PLMs learning instance-level knowledge (over 70%).

A Logical Pattern Memory Pre-trained Model for Entailment Tree Generation
Li Yuan | Yi Cai | Haopeng Ren | Jiexin Wang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Generating coherent and credible explanations remains a significant challenge in the field of AI. In recent years, researchers have delved into the utilization of entailment trees to depict explanations, which exhibit a reasoning process of how a hypothesis is deduced from the supporting facts. However, existing models often overlook the importance of generating intermediate conclusions with logical consistency from the given facts, leading to inaccurate conclusions and undermining the overall credibility of entailment trees. To address this limitation, we propose the logical pattern memory pre-trained model (LMPM). LMPM incorporates an external memory structure to learn and store the latent representations of logical patterns, which aids in generating logically consistent conclusions. Furthermore, to mitigate the influence of logically irrelevant domain knowledge in the Wikipedia-based data, we introduce an entity abstraction approach to construct the dataset for pre-training LMPM. The experimental results highlight the effectiveness of our approach in improving the quality of entailment tree generation. By leveraging logical entailment patterns, our model produces more coherent and reasonable conclusions that closely align with the underlying premises.

Beyond Code: Evaluate Thought Steps for Complex Code Generation
Liuwen Cao | Yi Cai | Jiexin Wang | Hongkui He | Hailin Huang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Code generation aims to generate code in a general-purpose programming language, such as C++, based on natural language intents. Existing efforts primarily focus on relatively simple programming problems and fail to evaluate the thought process involved in complex programming scenarios. In this paper, we introduce “steps-guided code generation,” a task that assesses the quality of both thought steps and code implementation to evaluate the overall management of handling a complex programming problem. To support this task, we construct CodeStepsEval, a real-world scenario dataset of complex programming problems in the C++ programming language with varying levels of difficulty. Comprehensive experiments on this dataset demonstrate the importance of high-quality steps in enhancing code generation performance and the challenges faced by the code LLMs in this task.

Grounded Multimodal Procedural Entity Recognition for Procedural Documents: A New Dataset and Baseline
Haopeng Ren | Yushi Zeng | Yi Cai | Zhenqi Ye | Li Yuan | Pinli Zhu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Much of commonsense knowledge in real world is the form of procudures or sequences of steps to achieve particular goals. In recent years, knowledge extraction on procedural documents has attracted considerable attention. However, they often focus on procedural text but ignore a common multimodal scenario in the real world. Images and text can complement each other semantically, alleviating the semantic ambiguity suffered in text-only modality. Motivated by these, in this paper, we explore a problem of grounded multimodal procedural entity recognition (GMPER), aiming to detect the entity and the corresponding bounding box groundings in image (i.e., visual entities). A new dataset (Wiki-GMPER) is bult and extensive experiments are conducted to evaluate the effectiveness of our proposed model.

Knowledge-Guided Cross-Topic Visual Question Generation
Hongfei Liu | Guohua Wang | Jiayuan Xie | Jiali Chen | Wenhao Fang | Yi Cai
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Visual question generation (VQG) task aims to generate high-quality questions based on the input image. Current methods primarily focus on generating questions containing specified content utilizing answers or question types as constraints. However, these constraints make it challenging to control the topic of generated questions (e.g., conversation or test subject topics) for various applications. Thus, it is necessary to utilize topics as constraints to guide question generation. Considering that there are many topics and it is almost impossible for human annotations to cover them, we propose the cross-topic learning VQG (CTL-VQG) task, which aims to generate questions related to unseen topics in cross-topic scenarios. In this paper, we propose a knowledge-guided cross-topic visual question generation (KC-VQG) model to extract unseen topic-related information for question generation. Specifically, an image-topic feature extractor is introduced in our model to extract topic-related intuitive visual features; an image-topic knowledge extractor is used to extract and select the most appropriate topic-related implicit knowledge from large language models for generating questions. Extensive experiments show that our model outperforms baselines and can effectively generate unseen topic-related questions in cross-topic scenarios.

Step Feasibility-Aware and Error-Correctable Entailment Tree Generation
Junyue Song | Xin Wu | Yi Cai
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

An entailment tree is a structured reasoning path that clearly demonstrates the process of deriving hypotheses through multiple steps of inference from known premises. It enhances the interpretability of QA systems. Existing methods for generating entailment trees typically employ iterative frameworks to ensure reasoning faithfulness. However, they often suffer from the issue of false feasible steps, where selected steps appear feasible but actually lead to incorrect intermediate conclusions. Moreover, the existing iterative frameworks do not consider error-prone search branches, resulting in error propagation. In this work, we propose SPEH: an iterative entailment tree generation framework with Step feasibility Perception and state Error Handling mechanisms. Step Feasibility Perception enables the model to learn how to choose steps that are not false feasible. State Error Handling includes error detection and backtracking, allowing the model to correct errors when entering incorrect search branches. Experimental results demonstrate the effectiveness of our approach in improving the generation of entailment trees.

2023

Rethinking Multimodal Entity and Relation Extraction from a Translation Point of View
Changmeng Zheng | Junhao Feng | Yi Cai | Xiaoyong Wei | Qing Li
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We revisit the multimodal entity and relation extraction from a translation point of view. Special attention is paid on the misalignment issue in text-image datasets which may mislead the learning. We are motivated by the fact that the cross-modal misalignment is a similar problem of cross-lingual divergence issue in machine translation. The problem can then be transformed and existing solutions can be borrowed by treating a text and its paired image as the translation to each other. We implement a multimodal back-translation using diffusion-based generative models for pseudo-paralleled pairs and a divergence estimator by constructing a high-resource corpora as a bridge for low-resource learners. Fine-grained confidence scores are generated to indicate both types and degrees of alignments with which better representations are obtained. The method has been validated in the experiments by outperforming 14 state-of-the-art methods in both entity and relation extraction tasks. The source code is available at https://github.com/thecharm/TMR.

CLEVR-Implicit: A Diagnostic Dataset for Implicit Reasoning in Referring Expression Comprehension
Jingwei Zhang | Xin Wu | Yi Cai
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Recently, pre-trained vision-language (VL) models have achieved remarkable success in various cross-modal tasks, including referring expression comprehension (REC). These models are pre-trained on the large-scale image-text pairs to learn the alignment between words in textual descriptions and objects in the corresponding images and then fine-tuned on downstream tasks. However, the performance of VL models is hindered when dealing with implicit text, which describes objects through comparisons between two or more objects rather than explicitly mentioning them. This is because the models struggle to align the implicit text with the objects in the images. To address the challenge, we introduce CLEVR-Implicit, a dataset consisting of synthetic images and corresponding two types of implicit text for the REC task. Additionally, to enhance the performance of VL models on implicit text, we propose a method called Transforming Implicit text into Explicit text (TIE), which enables VL models to reason with the implicit text. TIE consists of two modules: (1) the prompt design module builds prompts for implicit text by adding masked tokens, and (2) the cloze procedure module fine-tunes the prompts by utilizing masked language modeling (MLM) to predict the explicit words with the implicit prompts. Experimental results on our dataset demonstrate a significant improvement of 37.94% in the performance of VL models on implicit text after employing our TIE method.

Improving Named Entity Recognition via Bridge-based Domain Adaptation
Jingyun Xu | Changmeng Zheng | Yi Cai | Tat-Seng Chua
Findings of the Association for Computational Linguistics: ACL 2023

Recent studies have shown remarkable success in cross-domain named entity recognition (cross-domain NER). Despite the promising results, existing methods mainly utilize pre-training language models like BERT to represent words. As such, the original chaotic representations may challenge them to distinguish entity types of entities, leading to entity type misclassification. To this end, we attempt to utilize contrastive learning to refine the original representations and propose a model-agnostic framework named MoCL for cross-domain NER. Additionally, we respectively combine MoCL with two distinctive cross-domain NER methods and two pre-training language models to explore its generalization ability. Empirical results on seven domains show the effectiveness and good generalization ability of MoCL.

Constructing Procedural Graphs with Multiple Dependency Relations: A New Dataset and Baseline
Haopeng Ren | Yushi Zeng | Yi Cai | Bihan Zhou | Zetao Lian
Findings of the Association for Computational Linguistics: ACL 2023

Current structured and semi-structured knowledge bases mainly focus on representing descriptive knowledge but ignore another commonsense knowledge (Procedural Knowledge). To structure the procedural knowledge, existing methods are proposed to automatically generate flow graphs from procedural documents. They focus on extracting sequential dependency between sentences but neglect another two important dependencies (i.e., inclusion dependency and constraint dependency) in procedural documents. In our paper, we explore a problem of automatically generating procedural graph with multiple dependency relations to extend the flow graph constructed by existing methods and propose a procedural graph construction method with syntactic information and discourse structures. A new dataset (WHPG) is built and extensive experiments are conducted to evaluate the effectiveness of our proposed model.

Segment-Level and Category-Oriented Network for Knowledge-Based Referring Expression Comprehension
Yuqi Bu | Xin Wu | Liuwu Li | Yi Cai | Qiong Liu | Qingbao Huang
Findings of the Association for Computational Linguistics: ACL 2023

Knowledge-based referring expression comprehension (KB-REC) aims to identify visual objects referred to by expressions that incorporate knowledge. Existing methods employ sentence-level retrieval and fusion methods, which may lead to issues of similarity bias and interference from irrelevant information in unstructured knowledge sentences. To address these limitations, we propose a segment-level and category-oriented network (SLCO). Our approach includes a segment-level and prompt-based knowledge retrieval method to mitigate the similarity bias problem and a category-based grounding method to alleviate interference from irrelevant information in knowledge sentences. Experimental results show that our SLCO can eliminate interference and improve the overall performance of the KB-REC task.

Scene Graph Enhanced Pseudo-Labeling for Referring Expression Comprehension
Cantao Wu | Yi Cai | Liuwu Li | Jiexin Wang
Findings of the Association for Computational Linguistics: EMNLP 2023

Referring Expression Comprehension (ReC) is a task that involves localizing objects in images based on natural language expressions. Most ReC methods typically approach the task as a supervised learning problem. However, the need for costly annotations, such as clear image-text pairs or region-text pairs, hinders the scalability of existing approaches. In this work, we propose a novel scene graph-based framework that automatically generates high-quality pseudo region-query pairs. Our method harnesses scene graphs to capture the relationships between objects in images and generate expressions enriched with relation information. To ensure accurate mapping between visual regions and text, we introduce an external module that employs a calibration algorithm to filter out ambiguous queries. Additionally, we employ a rewriter module to enhance the diversity of our generated pseudo queries through rewriting. Extensive experiments demonstrate that our method outperforms previous pseudo-labeling methods by about 10%, 12%, and 11% on RefCOCO, RefCOCO+, and RefCOCOg, respectively. Furthermore, it surpasses the state-of-the-art unsupervised approach by more than 15% on the RefCOCO dataset.

2022

Towards Exploiting Sticker for Multimodal Sentiment Analysis in Social Media: A New Dataset and Baseline
Feng Ge | Weizhao Li | Haopeng Ren | Yi Cai
Proceedings of the 29th International Conference on Computational Linguistics

Sentiment analysis in social media is challenging since posts are short of context. As a popular way to express emotion on social media, stickers related to these posts can supplement missing sentiments and help identify sentiments precisely. However, research about stickers has not been investigated further. To this end, we present a Chinese sticker-based multimodal dataset for the sentiment analysis task (CSMSA). Compared with previous real-world photo-based multimodal datasets, the CSMSA dataset focuses on stickers, conveying more vivid and moving emotions. The sticker-based multimodal sentiment analysis task is challenging in three aspects: inherent multimodality of stickers, significant inter-series variations between stickers, and complex multimodal sentiment fusion. We propose SAMSAM to address the above three challenges. Our model introduces a flexible masked self-attention mechanism to allow the dynamic interaction between post texts and stickers. The experimental results indicate that our model performs best compared with other models. More researches need to be devoted to this field. The dataset is publicly available at https://github.com/Logos23333/CSMSA.

Mitigating Contradictions in Dialogue Based on Contrastive Learning
Weizhao Li | Junsheng Kong | Ben Liao | Yi Cai
Findings of the Association for Computational Linguistics: ACL 2022

Chatbot models have achieved remarkable progress in recent years but tend to yield contradictory responses. In this paper, we exploit the advantage of contrastive learning technique to mitigate this issue. To endow the model with the ability of discriminating contradictory patterns, we minimize the similarity between the target response and contradiction related negative example. The negative example is generated with learnable latent noise, which receives contradiction related feedback from the pretrained critic. Experimental results show that our method helps to avoid contradictions in response generation while preserving response fluency, outperforming existing methods on both automatic and human evaluation.

2021

ChicHealth @ MEDIQA 2021: Exploring the limits of pre-trained seq2seq models for medical summarization
Liwen Xu | Yan Zhang | Lei Hong | Yi Cai | Szui Sung
Proceedings of the 20th Workshop on Biomedical Language Processing

In this article, we will describe our system for MEDIQA2021 shared tasks. First, we will describe the method of the second task, multiple answer summary (MAS). For extracting abstracts, we follow the rules of (CITATION). First, the candidate sentences are roughly estimated by using the Roberta model. Then the Markov chain model is used to evaluate the sentences in a fine-grained manner. Our team won the first place in overall performance, with the fourth place in MAS task, the seventh place in RRS task and the eleventh place in QS task. For the QS and RRS tasks, we investigate the performanceS of the end-to-end pre-trained seq2seq model. Experiments show that the methods of adversarial training and reverse translation are beneficial to improve the fine tuning performance.

IgSEG: Image-guided Story Ending Generation
Qingbao Huang | Chuan Huang | Linzhang Mo | Jielong Wei | Yi Cai | Ho-fung Leung | Qing Li
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

Aligned Dual Channel Graph Convolutional Network for Visual Question Answering
Qingbao Huang | Jielong Wei | Yi Cai | Changmeng Zheng | Junying Chen | Ho-fung Leung | Qing Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Visual question answering aims to answer the natural language question about a given image. Existing graph-based methods only focus on the relations between objects in an image and neglect the importance of the syntactic dependency relations between words in a question. To simultaneously capture the relations between objects in an image and the syntactic dependency relations between words in a question, we propose a novel dual channel graph convolutional network (DC-GCN) for better combining visual and textual advantages. The DC-GCN model consists of three parts: an I-GCN module to capture the relations between objects in an image, a Q-GCN module to capture the syntactic dependency relations between words in a question, and an attention alignment module to align image representations and question representations. Experimental results show that our model achieves comparable performance with the state-of-the-art approaches.

A Two-phase Prototypical Network Model for Incremental Few-shot Relation Classification
Haopeng Ren | Yi Cai | Xiaofeng Chen | Guohua Wang | Qing Li
Proceedings of the 28th International Conference on Computational Linguistics

Relation Classification (RC) plays an important role in natural language processing (NLP). Current conventional supervised and distantly supervised RC models always make a closed-world assumption which ignores the emergence of novel relations in open environment. To incrementally recognize the novel relations, current two solutions (i.e, re-training and lifelong learning) are designed but suffer from the lack of large-scale labeled data for novel relations. Meanwhile, prototypical network enjoys better performance on both fields of deep supervised learning and few-shot learning. However, it still suffers from the incompatible feature embedding problem when the novel relations come in. Motivated by them, we propose a two-phase prototypical network with prototype attention alignment and triplet loss to dynamically recognize the novel relations with a few support instances meanwhile without catastrophic forgetting. Extensive experiments are conducted to evaluate the effectiveness of our proposed model.

Controllable Abstractive Sentence Summarization with Guiding Entities
Changmeng Zheng | Yi Cai | Guanjie Zhang | Qing Li
Proceedings of the 28th International Conference on Computational Linguistics

Entities are the major proportion and build up the topic of text summaries. Although existing text summarization models can produce promising results of automatic metrics, for example, ROUGE, it is difficult to guarantee that an entity is contained in generated summaries. In this paper, we propose a controllable abstractive sentence summarization model which generates summaries with guiding entities. Instead of generating summaries from left to right, we start with a selected entity, generate the left part first, then the right part of a complete summary. Compared to previous entity-based text summarization models, our method can ensure that entities appear in final output summaries rather than generating the complete sentence with implicit entity and article representations. Our model can also generate more novel entities with them incorporated into outputs directly. To evaluate the informativeness of the proposed model, we develop a fine-grained informativeness metrics in the relevance, extraness and omission perspectives. We conduct experiments in two widely-used sentence summarization datasets and experimental results show that our model outperforms the state-of-the-art methods in both automatic evaluation scores and informativeness metrics.

Task-oriented Domain-specific Meta-Embedding for Text Classification
Xin Wu | Yi Cai | Yang Kai | Tao Wang | Qing Li
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Meta-embedding learning, which combines complementary information in different word embeddings, have shown superior performances across different Natural Language Processing tasks. However, domain-specific knowledge is still ignored by existing meta-embedding methods, which results in unstable performances across specific domains. Moreover, the importance of general and domain word embeddings is related to downstream tasks, how to regularize meta-embedding to adapt downstream tasks is an unsolved problem. In this paper, we propose a method to incorporate both domain-specific and task-oriented information into meta-embeddings. We conducted extensive experiments on four text classification datasets and the results show the effectiveness of our proposed method.

TSDG: Content-aware Neural Response Generation with Two-stage Decoding Process
Junsheng Kong | Zhicheng Zhong | Yi Cai | Xin Wu | Da Ren
Findings of the Association for Computational Linguistics: EMNLP 2020

Neural response generative models have achieved remarkable progress in recent years but tend to yield irrelevant and uninformative responses. One of the reasons is that encoder-decoder based models always use a single decoder to generate a complete response at a stroke. This tends to generate high-frequency function words with less semantic information rather than low-frequency content words with more semantic information. To address this issue, we propose a content-aware model with two-stage decoding process named Two-stage Dialogue Generation (TSDG). We separate the decoding process of content words and function words so that content words can be generated independently without the interference of function words. Experimental results on two datasets indicate that our model significantly outperforms several competitive generative models in terms of automatic and human evaluation.

2019

A Boundary-aware Neural Model for Nested Named Entity Recognition
Changmeng Zheng | Yi Cai | Jingyun Xu | Ho-fung Leung | Guandong Xu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In natural language processing, it is common that many entities contain other entities inside them. Most existing works on named entity recognition (NER) only deal with flat entities but ignore nested ones. We propose a boundary-aware neural model for nested NER which leverages entity boundaries to predict entity categorical labels. Our model can locate entities precisely by detecting boundaries using sequence labeling models. Based on the detected boundaries, our model utilizes the boundary-relevant regions to predict entity categorical labels, which can decrease computation cost and relieve error propagation problem in layered sequence labeling model. We introduce multitask learning to capture the dependencies of entity boundaries and their categorical labels, which helps to improve the performance of identifying entities. We conduct our experiments on GENIA dataset and the experimental results demonstrate that our model outperforms other state-of-the-art methods.

Recognizing Conflict Opinions in Aspect-level Sentiment Classification with Dual Attention Networks
Xingwei Tan | Yi Cai | Changxi Zhu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Aspect-level sentiment classification, which is a fine-grained sentiment analysis task, has received lots of attention these years. There is a phenomenon that people express both positive and negative sentiments towards an aspect at the same time. Such opinions with conflicting sentiments, however, are ignored by existing studies, which design models based on the absence of them. We argue that the exclusion of conflict opinions is problematic, for the reason that it represents an important style of human thinking – dialectic thinking. If a real-world sentiment classification system ignores the existence of conflict opinions when it is designed, it will incorrectly mixed conflict opinions into other sentiment polarity categories in action. Existing models have problems when recognizing conflicting opinions, such as data sparsity. In this paper, we propose a multi-label classification model with dual attention mechanism to address these problems.

2016

Exploring Topic Discriminating Power of Words in Latent Dirichlet Allocation
Kai Yang | Yi Cai | Zhenhong Chen | Ho-fung Leung | Raymond Lau
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Latent Dirichlet Allocation (LDA) and its variants have been widely used to discover latent topics in textual documents. However, some of topics generated by LDA may be noisy with irrelevant words scattering across these topics. We name this kind of words as topic-indiscriminate words, which tend to make topics more ambiguous and less interpretable by humans. In our work, we propose a new topic model named TWLDA, which assigns low weights to words with low topic discriminating power (ability). Our experimental results show that the proposed approach, which effectively reduces the number of topic-indiscriminate words in discovered topics, improves the effectiveness of LDA.

Co-authors

Junsheng Kong 2

Thibaud Ardoin 1

Xiaofeng Chen 1

Zhenhong Chen 1

Mengqiu Cheng 1

Tat-Seng Chua 1

Mengzhen Wang 1

Yuancheng Wei 1

Gerhard Wunder 1

Guanjie Zhang 1

Jingwei Zhang 1

Xinting Zhang 1

Mengchen Zhao 1

Zhicheng Zhong 1

Venues