Yi Li

2025

Change Entity-guided Heterogeneous Representation Disentangling for Change Captioning
Yi Li | Yunbin Tu | Liang Li | Li Su | Qingming Huang
Findings of the Association for Computational Linguistics: ACL 2025

Change captioning aims to describe differences between a pair of images using natural language. However, learning effective difference representations is highly challenging due to distractors such as illumination and viewpoint changes. To address this, we propose a change-entity-guided disentanglement network that explicitly learns difference representations while mitigating the impact of distractors. Specifically, we first design a change entity retrieval module to identify key objects involved in the change from a textual perspective. Then, we introduce a difference representation enhancement module that strengthens the learned features, disentangling genuine differences from background variations. To further refine the generation process, we incorporate a gated Transformer decoder, which dynamically integrates both visual difference and textual change-entity information. Extensive experiments on CLEVR-Change, CLEVR-DC and Spot-the-Diff datasets demonstrate that our method outperforms existing approaches, achieving state-of-the-art performance. The code is available at https://github.com/yili-19/CHEER

pdf bib abs

Do Code Semantics Help? A Comprehensive Study on Execution Trace-Based Information for Code Large Language Models
Jian Jornbowrl Wang | Xiaofei Xie | Qiang Hu | Shangqing Liu | Yi Li
Findings of the Association for Computational Linguistics: EMNLP 2025

Code Large Language Models (Code LLMs) have opened a new era in programming with their impressive capabilities. However, recent research has revealed critical limitations in their ability to reason about runtime behavior and understand the actual functionality of programs, which poses significant challenges for their post-training and practical deployment. Specifically, Code LLMs encounter two principal issues: (1) a lack of proficiency in reasoning about program execution behavior, as they struggle to interpret what programs actually do during runtime, and (2) inconsistent and fragmented representation of semantic information, such as execution traces, across existing methods, which hinders their ability to generalize and reason effectively. These challenges underscore the necessity for more systematic approaches to enhance the reasoning capabilities of Code LLMs. To address these issues, we introduce a generic framework to support integrating semantic information (e.g., execution trace) to code task-relevant prompts, and conduct a comprehensive study to explore the role of semantic information in enhancing the reasoning ability of Code LLMs accordingly. Specifically, we focus on investigating the usefulness of trace-based semantic information in boosting supervised fine-tuning(SFT) and post-phase inference of Code LLMs. The experimental results surprisingly disagree with previous works and demonstrate that semantic information has limited usefulness for SFT and test time scaling of Code LLM.

2024

pdf bib abs

Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning
Guisheng Liu | Yi Li | Zhengcong Fei | Haiyan Fu | Xiangyang Luo | Yanqing Guo
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

While impressive performance has been achieved in image captioning, the limited diversity of the generated captions and the large parameter scale remain major barriers to the real-word application of these systems. In this work, we propose a lightweight image captioning network in combination with continuous diffusion, called Prefix-diffusion. To achieve diversity, we design an efficient method that injects prefix image embeddings into the denoising process of the diffusion model. In order to reduce trainable parameters, we employ a pre-trained model to extract image features and further design an extra mapping network. Prefix-diffusion is able to generate diverse captions with relatively less parameters, while maintaining the fluency and relevance of the captions benefiting from the generative capabilities of the diffusion model. Our work paves the way for scaling up diffusion models for image captioning, and achieves promising performance compared with recent approaches.

pdf bib abs

Machine translation (MT) evaluation has evolved toward a trend of fine-grained granularity, enabling a more precise diagnosis of hidden flaws and weaknesses of MT systems from various perspectives. This paper examines how MT systems are potentially affected by certain grammatical features, offering insights into the challenges these features pose and suggesting possible directions for improvement. We develop a new test suite by extracting 7,848 sentences from a multi-domain Chinese-English parallel corpus. All the Chinese text was further annotated with 43 grammatical features using a semi-automatic method. This test suite was subsequently used to evaluate eight state-of-the-art MT systems according to six different automatic evaluation metrics. The results reveal intriguing patterns of MT performance associated with different domains and various grammatical features, highlighting the test suite’s effectiveness. The test suite was made publicly available and it will serve as an important benchmark for evaluating and diagnosing Chinese-English MT systems.

2022

pdf bib abs

Multi-Attribute Controlled Text Generation with Contrastive-Generator and External-Discriminator
Guisheng Liu | Yi Li | Yanqing Guo | Xiangyang Luo | Bo Wang
Proceedings of the 29th International Conference on Computational Linguistics

Though existing researches have achieved impressive results in controlled text generation, they focus mainly on single-attribute control. However, in applications like automatic comments, the topic and sentiment need to be controlled simultaneously. In this work, we propose a new framework for multi-attribute controlled text generation. To achieve this, we design a contrastive-generator that can effectively generate texts with more attributes. In order to increase the convergence of the text on the desired attributes, we adopt an external-discriminator to distinguish whether the generated text holds the desired attributes. Moreover, we propose top-n weighted decoding to further improve the relevance of texts to attributes. Automated evaluations and human evaluations show that our framework achieves remarkable controllability in multi-attribute generation while keeping the text fluent and diverse. It also yields promising performance on zero-shot generation.

pdf bib abs

CLGC: A Corpus for Chinese Literary Grace Evaluation
Yi Li | Dong Yu | Pengyuan Liu
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we construct a Chinese literary grace corpus, CLGC, with 10,000 texts and more than 1.85 million tokens. Multi-level annotations are provided for each text in our corpus, including literary grace level, sentence category, and figure-of-speech type. Based on the corpus, we dig deep into the correlation between fine-grained features (semantic information, part-of-speech and figure-of-speech, etc.) and literary grace level. We also propose a new Literary Grace Evaluation (LGE) task, which aims at making a comprehensive assessment of the literary grace level according to the text. In the end, we build some classification models with machine learning algorithms (such as SVM, TextCNN) to prove the effectiveness of our features and corpus for LGE. The results of our preliminary classification experiments have achieved 79.71% on the weighted average F1-score.

2018

pdf bib abs

Large Margin Neural Language Model
Jiaji Huang | Yi Li | Wei Ping | Liang Huang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We propose a large margin criterion for training neural language models. Conventionally, neural language models are trained by minimizing perplexity (PPL) on grammatical sentences. However, we demonstrate that PPL may not be the best metric to optimize in some tasks, and further propose a large margin formulation. The proposed method aims to enlarge the margin between the “good” and “bad” sentences in a task-specific sense. It is trained end-to-end and can be widely applied to tasks that involve re-scoring of generated text. Compared with minimum-PPL training, our method gains up to 1.1 WER reduction for speech recognition and 1.0 BLEU increase for machine translation.

Yi Li

2025

2024

2022

2018

2016

2011

2007

Co-authors

Venues