Yuanyuan Sun (孙媛媛, 孙嫒媛) - ACL Anthology

Yuanyuan Sun

Also published as: 媛媛孙, 嫒媛孙

2026

Mechanistic Insights into Deferred Semantic Drift in LLMs
Jingjie Zeng | Huayang Li | Liang Yang | Shaowu Zhang | Yuanyuan Sun | Hongfei Lin
Findings of the Association for Computational Linguistics: ACL 2026

Large Language Models (LLMs) face a fundamental challenge with delayed disambiguation: **How is the meaning of an ambiguous word updated when clarifying context arrives only after it has been processed?** While LLMs possess the latent capacity to resolve such ambiguities—as revealed when a full, non-causal context is provided—their unidirectional architecture prevents immediate updates. We investigate the underlying computational mechanism and show this semantic re-evaluation is deferred to subsequent tokens in a process we term "Deferred Semantic Drift (DSD)". Through targeted analysis of attentional pathways, we find that later tokens actively retrieve context-dependent "informational packets" from the ambiguous word’s value vector to steer the final interpretation. We demonstrate this mechanism in metaphor comprehension and provide causal validation by steering model outputs towards literal or metaphorical meanings via targeted activation interventions. This research uncovers a key computational strategy for meaning construction, offering crucial insights for understanding and guiding the behavior of LLMs.

CogEvolve: A Multimodal Benchmark for Evaluating Relational Reasoning in Semantic Extension
Jingjie Zeng | Huayang Li | Liang Yang | Yuanyuan Sun | Shaowu Zhang | Hongfei Lin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Human cognition excels at extending knowledge through analogy, where word meanings evolve along structured pathways from concrete prototypes to abstract senses via metaphor and metonymy. Do Large Language Models (LLMs) internalize this generative logic, or merely mimic statistical patterns? To investigate this, we introduce CogEvolve, a cognitive linguistic benchmark designed to test these evolutionary pathways across textual and visual modalities. Our evaluation reveals a distinct cognitive profile: models function as "Super-Associators" expert at static recognition yet fail at causal reasoning. In text, they exhibit a Frequency-Primacy Conflation, confusing statistical prevalence with cognitive basicness. Crucially, this reasoning collapses further in the visual domain. We term this deficit the Ungrounded Arrow: models possess high-fidelity concept representations (the "dots") but lack the transformational operators (the "arrows") essential for true relational understanding.

2025

DUTIR at SemEval-2025 Task 10: A Large Language Model-based Approach for Entity Framing in Online News
Tengxiao Lv | Juntao Li | Chao Liu | Yiyang Kang | Ling Luo | Yuanyuan Sun | Hongfei Lin
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

We propose a multilingual text processing framework that combines multilingual translation with data augmentation, QLoRA-based multi-model fine-tuning, and GLM-4-Plus-based ensemble classification. By using GLM-4-Plus to translate multilingual texts into English, we enhance data diversity and quantity. Data augmentation effectively improves the model’s performance on imbalanced datasets. QLoRA fine-tuning optimizes the model and reduces classification loss. GLM-4-Plus, as a meta-classifier, further enhances system performance. Our system achieved first place in three languages (English, Portuguese and Russian).

DUT_IR at SemEval-2025 Task 11: Enhancing Multi-Label Emotion Classification with an Ensemble of Pre-trained Language Models and Large Language Models
Chao Liu | Junliang Liu | Tengxiao Lv | Huayang Li | Tao Zeng | Ling Luo | Yuanyuan Sun | Hongfei Lin
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

In this work, we tackle the challenge of multi-label emotion classification, where a sentence can simultaneously express multiple emotions. This task is particularly difficult due to the overlapping nature of emotions and the limited context available in short texts. To address these challenges, we propose an ensemble approach that integrates Pre-trained Language Models (BERT-based models) and Large Language Models, each capturing distinct emotional cues within the text. The predictions from these models are aggregated through a voting mechanism, enhancing classification accuracy. Additionally, we incorporate threshold optimization and class weighting techniques to mitigate class imbalance. Our method demonstrates substantial improvements over baseline models. Our approach ranked 4th out of 90 on the English leaderboard and exhibited strong performance in English in SemEval-2025 Task 11 Track A.

Prototype Tuning: A Meta-Learning Approach for Few-Shot Document-Level Relation Extraction with Large Language Models
Dinghao Pan | Yuanyuan Sun | Bo Xu | Jiru Li | Zhihao Yang | Ling Luo | Hongfei Lin | Jian Wang
Findings of the Association for Computational Linguistics: NAACL 2025

Few-Shot Document-Level Relation Extraction (FSDLRE) aims to develop models capable of generalizing to new categories with minimal support examples. Although Large Language Models (LLMs) demonstrate exceptional In-Context Learning (ICL) capabilities on many few-shot tasks, their performance on FSDLRE tasks remains suboptimal due to the significant gap between the task format and the intrinsic capabilities of language models, coupled with the complexity of ICL prompts for document-level text. To address these challenges, we introduce a novel meta-training approach for LLMs termed Prototype Tuning. We construct simulated episodes using data with relation types that do not overlap with the test corpus, fundamentally enhancing the ICL capabilities of LLMs in FSDLRE through meta-learning. To further enhance the effects of meta-learning, we innovatively integrate the concept of prototype into the fine-tuning process of LLMs. This involves aggregating entity pairs from support documents into prototypes within the prompts and altering the way of determining relation categories to identifying the closest prototype. Experimental results demonstrate that our LLMs trained with this approach outperform all baselines. Our proposed approach markedly improves the ICL capabilities of LLMs in FSDLRE and mitigates the impact of relation semantic discrepancies between the training corpus and the test corpus on model performance.

STATE ToxiCN: A Benchmark for Span-level Target-Aware Toxicity Extraction in Chinese Hate Speech Detection
Zewen Bai | Liang Yang | Shengdi Yin | Junyu Lu | Jingjie Zeng | Haohao Zhu | Yuanyuan Sun | Hongfei Lin
Findings of the Association for Computational Linguistics: ACL 2025

The proliferation of hate speech has caused significant harm to society. The intensity and directionality of hate are closely tied to the target and argument it is associated with. However, research on hate speech detection in Chinese has lagged behind, and existing datasets lack span-level fine-grained annotations. Furthermore, the lack of research on Chinese hateful slang poses a significant challenge. In this paper, we provide two valuable fine-grained Chinese hate speech detection research resources. First, we construct a Span-level Target-Aware Toxicity Extraction dataset (STATE ToxiCN), which is the first span-level Chinese hate speech dataset. Secondly, we evaluate the span-level hate speech detection performance of existing models using STATE ToxiCN. Finally, we conduct the first study on Chinese hateful slang and evaluate the ability of LLMs to understand hate semantics. Our work contributes valuable resources and insights to advance span-level hate speech detection in Chinese.

CCL25-Eval任务12总结报告:面向中文语音的实体关系三元组抽取
Wenxuan Mu | Jinzhong Ning | Yilin Pan | Paerhati Tulajiang | Yuanyuan Sun | SongTao Li | Yanxu Ji | Weiming Yin | Yijia Zhang | Hongfei Lin
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"中文语音实体关系三元组抽取任务(Chinese Speech Entity-Relation Triple Extraction Task, CSRTE)是第二十四届中国计算语言学大会中的一项技术评测,旨在从中文语音数据中自动识别并提取实体及其相互关系,构建结构化的语音关系三元组(头实体、关系、尾实体)。本任务的目标是提升中文语音关系三元组抽取的准确性与效率,增强模型在不同语境和复杂语音场景下的鲁棒性,实现从语音输入到文本三元组输出的全流程自动化处理。通过本次评测,有助于推动中文语音信息抽取技术的发展,促进语音与自然语言处理技术的深度融合,为智能应用提供更加丰富且精准的基础数据支持。此次评测共有257支队伍报名参赛,其中59支队伍提交了A榜成绩。成绩排名前15的队伍晋级A榜,并且表现突出的前朷支队伍提交了技术报告。"

CCL25-Eval任务9系统报告:一种面向中医辨证与处方生成任务的检索增强大模型方法
Yiyang Kang | Yao Jiaqi | Tengxiao Lv | Bo Xu | Ling Luo | Yuanyuan Sun | Hongfei Lin
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"本文面向CCL2025-Eval任务9中的中医辨证辨病与中药处方推荐两个子任务,提出了一套基于大语言模型的系统性方法。在子任务1中,本文基于QLoRA方法对Qwen2.5-7B、Mistral-7B和Baichuan-7B三种预训练模型进行高效微调,并引入多模型集成投票策略。在子任务串中,本文设计了融合向量检索、监督微调与强化学习的中药推荐框架,通过相似度检索构建候选处方集合,并利用强化学习优化模型的生成能力。最终在评测中获得总分0.5171(Task1得分0.5710,Task2得分0.4632),排名第四,验证了所提方法的有效性与实用性。"

基于大语言模型的中文医学命名实体识别
Tengxiao Lv | Ling Luo | Huiyi Lv | Yuanyuan Sun | Jian Wang | Hongfei Lin
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"从中文文本中准确识别医学命名实体是实现中文医疗信息结构化的关键。传统机器学习方法在面对中文医学实体边界模糊和嵌套结构复杂等问题时效果有限。本文提出一种基于大语言模型的中文医学命名实体识别方法,首先通过任务重构将识别过程转化为文本生成任务,设计了适配的标注策略以统一处理平面与嵌套实体,然后引入实体筛选器过滤错误候选实体,最后通过大语言模型决策进行冲突消解与多模型集成提升系统整体鲁棒性。在CMeEE-V2与CCKS2019两个数据集上实验结果显示,所提方法在识别准确性与鲁棒性方面均达到当前先进水平,F1值分别为0.7785和0.8821。"

基于区域顶点标注的司法文本实体关系联合抽取
Yingying Le | Yuanyuan Sun | Hongfei Lin
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"司法领域中的实体关系联合抽取在许多下游任务中(如量刑预测、知识库构建等)具有重要意义。然而,由于垂直领域中的数据资源稀缺,而且司法文本中存在复杂的长句以及关系重叠现象,这使得信息抽取工作颇具挑战性。为应对这一挑战,我们首先标注了一个包含多个罪名的司法领域的专有数据集,然后提出了一种基于三元组区域顶点的联合抽取填表法。我们采用多标签分类对三元组的边界进行标注,以此提取三元组,从而充分利用实体的边界信息。此外,为融入实体对之间的距离信息,我们引入了距离嵌入,并采用扩张卷积来捕捉多尺度上下文信息。我们在司法数据集上对模型进行了评估。实验结果表明,我们的模型在这个数据集上均取得了最先进的性能。"

It’s Not Bragging If You Can Back It Up: Can LLMs Understand Braggings?
Jingjie Zeng | Huayang Li | Liang Yang | Yuanyuan Sun | Hongfei Lin
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Bragging, as a pervasive social-linguistic phenomenon, reflects complex human interaction patterns. However, the understanding and generation of appropriate bragging behavior in large language models (LLMs) remains underexplored. In this paper, we propose a comprehensive study that combines analytical and controllable approaches to examine bragging in LLMs. We design three tasks, bragging recognition, bragging explanation, and bragging generation, along with novel evaluation metrics to assess the models’ ability to identify bragging intent, social appropriateness, and account for context sensitivity. Our analysis reveals the challenges of bragging in the social context, such as recognizing bragging and responding appropriately with bragging in conversation. This work provides new insights into how LLMs process bragging and highlights the need for more research on generating contextually appropriate behavior in LLMs.

Sheep’s Skin, Wolf’s Deeds: Are LLMs Ready for Metaphorical Implicit Hate Speech?
Jingjie Zeng | Liang Yang | Zekun Wang | Yuanyuan Sun | Hongfei Lin
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Implicit hate speech has become a significant challenge for online platforms, as it often avoids detection by large language models (LLMs) due to its indirectly expressed hateful intent. This study identifies the limitations of LLMs in detecting implicit hate speech, particularly when disguised as seemingly harmless expressions in a rhetorical device. To address this challenge, we employ a Jailbreaking strategy and Energy-based Constrained Decoding techniques, and design a small model for measuring the energy of metaphorical rhetoric. This approach can lead to LLMs generating metaphorical implicit hate speech. Our research reveals that advanced LLMs, like GPT-4o, frequently misinterpret metaphorical implicit hate speech, and fail to prevent its propagation effectively. Even specialized models, like ShieldGemma and LlamaGuard, demonstrate inadequacies in blocking such content, often misclassifying it as harmless speech. This work points out the vulnerability of current LLMs to implicit hate speech, and emphasizes the improvements to address hate speech threats better.

2024

Breaking the Boundaries: A Unified Framework for Chinese Named Entity Recognition Across Text and Speech
Jinzhong Ning | Yuanyuan Sun | Bo Xu | Zhihao Yang | Ling Luo | Hongfei Lin
Findings of the Association for Computational Linguistics: EMNLP 2024

In recent years, with the vast and rapidly increasing amounts of spoken and textual data, Named Entity Recognition (NER) tasks have evolved into three distinct categories, i.e., text-based NER (TNER), Speech NER (SNER) and Multimodal NER (MNER). However, existing approaches typically require designing separate models for each task, overlooking the potential connections between tasks and limiting the versatility of NER methods. To mitigate these limitations, we introduce a new task named Integrated Multimodal NER (IMNER) to break the boundaries between different modal NER tasks, enabling a unified implementation of them. To achieve this, we first design a unified data format for inputs from different modalities. Then, leveraging the pre-trained MMSpeech model as the backbone, we propose an **I**ntegrated **M**ultimod**a**l **Ge**neration Framework (**IMAGE**), formulating the Chinese IMNER task as an entity-aware text generation task. Experimental results demonstrate the feasibility of our proposed IMAGE framework in the IMNER task. Our work in integrated multimodal learning in advancing the performance of NER may set up a new direction for future research in the field. Our source code is available at https://github.com/NingJinzhong/IMAGE4IMNER.

融合领域词汇扩充的低资源法律文书命名实体识别(Named Entity Recognition for Low-Resource Legal Documents Using Integrated Domain Vocabulary Expansion)
Tulajiang Paerhati (帕尔哈提吐拉江) | Yuanyuan Sun (孙嫒媛) | Aichen Cai (蔡艾辰) | Yanhua Wang (王艳华) | Hongfei Lin (林鸿飞)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“目前基于预训练语言模型的司法领域低资源法律文书命名实体识别研究主要面临两个问题:(1)在低资源语言中,如维吾尔语,法律文书相关的语料极其有限,这种语料资源稀缺限制了基于预训练语言模型的训练和性能。(2)法律文书中使用的专业术语不仅复杂且特定,新的法律术语和概念的出现使得现有的模型难以适应。针对上述问题,本文基于多语言预训练模型mBERT,通过领域词汇扩充及模型微调的方法,提升了模型在维吾尔语法律文书命名实体识别任务的性能。本文首先整理并构建维吾尔语司法领域专业词汇列表,并将其添加到mBERT模型的词汇表中。随后,在人工标注的维吾尔语法律文书命名实体数据集UgLaw-NERD上进行模型微调,验证了该方法的有效性。实验结果表明,相比于仅使用mBERT进行微调的基线模型,融合领域词汇扩充的模型在命名实体识别任务上F1得分提升至89.72%,较基线提高了7.39%。此外,本文还探讨了不同领域词汇扩充量对模型命名实体识别性能的影响,结果显示,领域词汇扩充增强了预训练模型在处理维吾尔语任务中的表现。这些结论为其他低资源语言在司法领域开展基于预训练模型的自然语言处理研究提供了有益的参考。”

2023

OD-RTE: A One-Stage Object Detection Framework for Relational Triple Extraction
Jinzhong Ning | Zhihao Yang | Yuanyuan Sun | Zhizheng Wang | Hongfei Lin
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The Relational Triple Extraction (RTE) task is a fundamental and essential information extraction task. Recently, the table-filling RTE methods have received lots of attention. Despite their success, they suffer from some inherent problems such as underutilizing regional information of triple. In this work, we treat the RTE task based on table-filling method as an Object Detection task and propose a one-stage Object Detection framework for Relational Triple Extraction (OD-RTE). In this framework, the vertices-based bounding box detection, coupled with auxiliary global relational triple region detection, ensuring that regional information of triple could be fully utilized. Besides, our proposed decoding scheme could extract all types of triples. In addition, the negative sampling strategy of relations in the training stage improves the training efficiency while alleviating the imbalance of positive and negative relations. The experimental results show that 1) OD-RTE achieves the state-of-the-art performance on two widely used datasets (i.e., NYT and WebNLG). 2) Compared with the best performing table-filling method, OD-RTE achieves faster training and inference speed with lower GPU memory usage. To facilitate future research in this area, the codes are publicly available at https://github.com/NingJinzhong/ODRTE.

2022

Two Languages Are Better than One: Bilingual Enhancement for Chinese Named Entity Recognition
Jinzhong Ning | Zhihao Yang | Zhizheng Wang | Yuanyuan Sun | Hongfei Lin | Jian Wang
Proceedings of the 29th International Conference on Computational Linguistics

Chinese Named Entity Recognition (NER) has continued to attract research attention. However, most existing studies only explore the internal features of the Chinese language but neglect other lingual modal features. Actually, as another modal knowledge of the Chinese language, English contains rich prompts about entities that can potentially be applied to improve the performance of Chinese NER. Therefore, in this study, we explore the bilingual enhancement for Chinese NER and propose a unified bilingual interaction module called the Adapted Cross-Transformers with Global Sparse Attention (ACT-S) to capture the interaction of bilingual information. We utilize a model built upon several different ACT-Ss to integrate the rich English information into the Chinese representation. Moreover, our model can learn the interaction of information between bilinguals (inter-features) and the dependency information within Chinese (intra-features). Compared with existing Chinese NER methods, our proposed model can better handle entities with complex structures. The English text that enhances the model is automatically generated by machine translation, avoiding high labour costs. Experimental results on four well-known benchmark datasets demonstrate the effectiveness and robustness of our proposed model.

2021

面向法律文本的实体关系联合抽取算法(Joint Entity and Relation Extraction for Legal Texts)
Wenhui Song (宋文辉) | Xiang Zhou (周翔) | Ping Yang (杨萍) | Yuanyuan Sun (孙媛媛) | Liang Yang (杨亮) | Hongfei Lin (林鸿飞)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

法律文本中包含的丰富信息可以通过结构化的实体关系三元组进行表示,便于法律知识的存储和查询。传统的流水线方法在自动抽取三元组时执行了大量冗余计算,造成了误差传播。而现有的联合学习方法无法适用于有大量重叠关系的法律文本,也并未关注语法结构信息对文本表示的增强,因此本文提出一种面向法律文本的实体关系联合抽取模型。该模型首先通过ON-LSTM注入语法信息,然后引入多头注意力机制分解重叠关系。相较于流水线和其他联合学习方法本文模型抽取效果最佳,在涉毒类法律文本数据集上抽取结果的F1值达到78.7%。

2020

Joint Entity and Relation Extraction for Legal Documents with Legal Feature Enhancement
Yanguang Chen | Yuanyuan Sun | Zhihao Yang | Hongfei Lin
Proceedings of the 28th International Conference on Computational Linguistics

In recent years, the plentiful information contained in Chinese legal documents has attracted a great deal of attention because of the large-scale release of the judgment documents on China Judgments Online. It is in great need of enabling machines to understand the semantic information stored in the documents which are transcribed in the form of natural language. The technique of information extraction provides a way of mining the valuable information implied in the unstructured judgment documents. We propose a Legal Triplet Extraction System for drug-related criminal judgment documents. The system extracts the entities and the semantic relations jointly and benefits from the proposed legal lexicon feature and multi-task learning framework. Furthermore, we manually annotate a dataset for Named Entity Recognition and Relation Extraction in Chinese legal domain, which contributes to training supervised triplet extraction models and evaluating the model performance. Our experimental results show that the legal feature introduction and multi-task learning framework are feasible and effective for the Legal Triplet Extraction System. The F1 score of triplet extraction finally reaches 0.836 on the legal dataset.

基于预训练语言模型的案件要素识别方法(A Method for Case Factor Recognition Based on Pre-trained Language Models)
Haishun Liu (刘海顺) | Lei Wang (王雷) | Yanguang Chen (陈彦光) | Shuchen Zhang (张书晨) | Yuanyuan Sun (孙媛媛) | Hongfei Lin (林鸿飞)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

案件要素识别指将案件描述中重要事实描述自动抽取出来,并根据领域专家设计的要素体系进行分类,是智慧司法领域的重要研究内容。基于传统神经网络的文本编码难以提取深层次特征,基于阈值的多标签分类难以捕获标签间依赖关系,因此本文提出了基于预训练语言模型的多标签文本分类模型。该模型采用以Layer-attentive策略进行特征融合的语言模型作为编码器,使用基于LSTM的序列生成模型作为解码器。在“CAIL2019”数据集上进行实验,该方法比基于循环神经网络的算法在F1值上最高可提升7.6%,在相同超参数设置下比基础语言模型(BERT)提升约3.2%。

Co-authors

Venues