Tengxiao Lv

2025

In this work, we tackle the challenge of multi-label emotion classification, where a sentence can simultaneously express multiple emotions. This task is particularly difficult due to the overlapping nature of emotions and the limited context available in short texts. To address these challenges, we propose an ensemble approach that integrates Pre-trained Language Models (BERT-based models) and Large Language Models, each capturing distinct emotional cues within the text. The predictions from these models are aggregated through a voting mechanism, enhancing classification accuracy. Additionally, we incorporate threshold optimization and class weighting techniques to mitigate class imbalance. Our method demonstrates substantial improvements over baseline models. Our approach ranked 4th out of 90 on the English leaderboard and exhibited strong performance in English in SemEval-2025 Task 11 Track A.

pdf bib abs

We propose a multilingual text processing framework that combines multilingual translation with data augmentation, QLoRA-based multi-model fine-tuning, and GLM-4-Plus-based ensemble classification. By using GLM-4-Plus to translate multilingual texts into English, we enhance data diversity and quantity. Data augmentation effectively improves the model’s performance on imbalanced datasets. QLoRA fine-tuning optimizes the model and reduces classification loss. GLM-4-Plus, as a meta-classifier, further enhances system performance. Our system achieved first place in three languages (English, Portuguese and Russian).

pdf bib abs

"本文面向CCL2025-Eval任务9中的中医辨证辨病与中药处方推荐两个子任务,提出了一套基于大语言模型的系统性方法。在子任务1中,本文基于QLoRA方法对Qwen2.5-7B、Mistral-7B和Baichuan-7B三种预训练模型进行高效微调,并引入多模型集成投票策略。在子任务串中,本文设计了融合向量检索、监督微调与强化学习的中药推荐框架,通过相似度检索构建候选处方集合,并利用强化学习优化模型的生成能力。最终在评测中获得总分0.5171(Task1得分0.5710,Task2得分0.4632),排名第四,验证了所提方法的有效性与实用性。"

pdf bib abs

"With the widespread adoption of Electronic Medical Records (EMRs), automated coding of theInternational Classification of Diseases (ICD) has become increasingly essential. However, the complexity of Chinese clinical texts presents significant challenges to traditional methods. To address these issues, CCL25-Eval Task 8 organized the Chinese EMRs ICD Diagnosis CodingEvaluation. This paper presents a method based on Large Language Models (LLMs), which divides the task into primary and other diagnosis coding. For the primary diagnosis, a confidence-guided semantic retrieval strategy is applied, while ensemble learning enhanced with NamedEntity Recognition (NER) is used for other diagnoses. The proposed approach achieved 83.42%accuracy on the official test set, ranking second in the evaluation."

pdf bib abs

"从中文文本中准确识别医学命名实体是实现中文医疗信息结构化的关键。传统机器学习方法在面对中文医学实体边界模糊和嵌套结构复杂等问题时效果有限。本文提出一种基于大语言模型的中文医学命名实体识别方法,首先通过任务重构将识别过程转化为文本生成任务,设计了适配的标注策略以统一处理平面与嵌套实体,然后引入实体筛选器过滤错误候选实体,最后通过大语言模型决策进行冲突消解与多模型集成提升系统整体鲁棒性。在CMeEE-V2与CCKS2019两个数据集上实验结果显示,所提方法在识别准确性与鲁棒性方面均达到当前先进水平,F1值分别为0.7785和0.8821。"

Co-authors

Bo Xu 1

Venues

Fix author