Yifan Li

2025

本技术报告探讨了通过微调本地视觉语言模型,实现汉字硬笔书写质量自动评价的技术方案。针对传统评价方法难以提供准确性反馈的问题,我们团队采用精心设计的prompt并结合微调的方式构建了一个高效的汉字硬笔书写质量自动评价系统。我们采用Qwen2.5-VL-7B-Instruct模型作为基础,通过LoRA微调技术实现了汉字书写质量等级分类(子任务一)和个性化评语生成(子任务二)的功能。系统地融合了视觉特征分析与语言生成能力,在训练过程中采用了梯度检查点、BF16混合精度训练等技术优化显存使用,并设计了针对性的损失函数和评估指标。实验结果表明,我们的方法能够有效实现汉字书写质量的细粒度评价。

pdf bib abs

CCL25-Eval任务1系统报告:使用思维链和投票集成增强大型语言模型空间语义理解
LiuHaixin LiuHaixin | Hongying Zan | Jinwang Song | Yifan Li | KongLulu KongLulu
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"本技术报告详细介绍了我们团队在第五届空间语义理解评测(SpaCE2025)中的方法与成果。SpaCE2025 继续聚焦大语言模型在空间语义理解方面的能力评估,涵盖空间语言理解与空间推理两个核心维度,共设置五个子任务:空间信息正误判断、空间参照实体判断、空间异形同义判断、中文空间方位关系推理以及英文空间方位关系推理。我们通过设计结构化提示词并引入思维链推理机制,结合LoRA 微调技术和投票集成方法,有效提升了大语言模型在空间语义理解任务中的表现。在最终评测中,我们团队五个子任务的综合准确率为0.5983,整体排名第五。"

pdf bib abs

Real-world vision-language applications demand varying levels of perceptual granularity. However, most existing visual large language models (VLLMs), such as LLaVA, pre-assume a fixed resolution for downstream tasks, which leads to subpar performance. To address this problem, we first conduct a comprehensive and pioneering investigation into the resolution preferences of different vision-language tasks, revealing a correlation between resolution preferences with (1) image complexity, and (2) uncertainty variance of the VLLM at different image input resolutions. Building on this insight, we propose an empirical formula to determine the optimal resolution for a given vision-language task, accounting for these two factors as the zeroth-order and first-order terms in the Taylor expansion on a given image input. Second, based on rigorous experiments, we propose a novel parameter-efficient fine-tuning technique to extend the visual input resolution of pre-trained VLLMs to the identified optimal resolution. Extensive experiments on various vision-language tasks validate the effectiveness of our method.

pdf bib abs

Solving complex reasoning tasks is a key real-world application of agents. Thanks to the pretraining of Large Language Models (LLMs) on code data, recent approaches like CodeAct successfully use code as LLM agents’ action, achieving good results. However, CodeAct greedily generates the next action’s code block by relying on fragmented thoughts, resulting in inconsistency and accumulative hallucination. Moreover, CodeAct lacks action-related ground-truth (GT), making its supervision signals and termination conditions questionable in multi-turn interactions. To address these issues, we propose Tree-of-Code (ToC), a self-growing framework that generates nodes through self-supervision, incorporating prompt and model exploration in a GT-free setting. Each node employs CodeProgram, an end-to-end code generation paradigm that aligns executable code logic with global reasoning. This approach uses task-level execution success as both node validity and stop-growing flags, bypassing process supervision to enable online applications. Experiments on two datasets with ten popular zero-shot LLMs show that ToC boosts accuracy by nearly 20% over CodeAct with fewer than 1/4 turns. To further investigate the trade-off between efficacy and efficiency, ablation studies on different ToC tree sizes and exploration mechanisms validate ToC’s superiority.

pdf bib

A corpus-assisted metaphor analysis in portraying businesswomen: Diachronic changes in Hong Kong English news
Yifan Li | Yanlin Li | Kathleen Ahrens
Proceedings of the 39th Pacific Asia Conference on Language, Information and Computation

2024

pdf bib abs

Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs) by integrating external knowledge bases, improving their performance in applications like fact-checking and information searching. In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases by injecting deceptive content into the retrieval database, intentionally changing the model’s behavior. This threat is critical as it mirrors real-world usage scenarios where RAG systems interact with publicly accessible knowledge bases, such as web scrapings and user-contributed data pools. To be more realistic, we target a realistic setting where the adversary has no knowledge of users’ queries, knowledge base data, and the LLM parameters. We demonstrate that it is possible to exploit the model successfully through crafted content uploads with access to the retriever. Our findings emphasize an urgent need for security measures in the design and deployment of RAG systems to prevent potential manipulation and ensure the integrity of machine-generated content.

pdf bib abs

Diffusion-NAT: Self-Prompting Discrete Diffusion for Non-Autoregressive Text Generation
Kun Zhou | Yifan Li | Xin Zhao | Ji-Rong Wen
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Recently, continuous diffusion models (CDM) have been introduced into non-autoregressive (NAR) text-to-text generation. However, the discrete nature of text increases the difficulty of CDM to generate coherent and fluent texts, and also causes the incompatibility problem between CDM and advanced NLP techniques, especially the popular pre-trained language models (PLMs).To solve it, we propose Diffusion-NAT, which introduces discrete diffusion models (DDM) into NAR text-to-text generation and integrates BART to improve the performance.By revising the decoding process of BART and the typical settings of DDM, we unify the inference process of BART and the denoising process of DDM into the same NAR masked tokens recovering task.In this way, DDM can rely on BART to perform denoising, which can benefit from both the rich pre-learned knowledge of BART and the iterative refining paradigm of DDM.Besides, we also propose the iterative self-prompting strategy to further improve the generation quality.Experimental results on 7 datasets show that our approach can outperform competitive NAR methods, and even surpass autoregressive methods.Our code and data are released at https://github.com/RUCAIBox/DiffusionNAT.

2023

pdf bib abs

Inspired by the superior language abilities of large language models (LLM), large vision-language models (LVLM) have been recently proposed by integrating powerful LLMs for improving the performance on complex multimodal tasks. Despite the promising progress on LVLMs, we find that they suffer from object hallucinations, i.e., they tend to generate objects inconsistent with the target images in the descriptions. To investigate it, this work presents the first systematic study on object hallucination of LVLMs. We conduct the evaluation experiments on several representative LVLMs, and show that they mostly suffer from severe object hallucination issues. We further discuss that the visual instructions may influence the hallucination, and find that: objects that frequently appear in the visual instructions or co-occur with the image objects are obviously prone to be hallucinated by LVLMs. Besides, we further design a polling-based query method called POPE for better evaluation of object hallucination. Experiment results show that our POPE can evaluate object hallucination in a more stable and flexible way.