Zhiwen Tang
2025
System Report for CCL25-Eval Task 5: Hierarchical Multi-Task Prompt Fine-Tuning and PPO Reinforcement for Classical Chinese Poetry Comprehension and Sentiment Reasoning
Jingjun Tang | Zhiwen Tang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Jingjun Tang | Zhiwen Tang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"We present a hierarchical multi-task framework to enhance classical Chinese poetry understand-ing and sentiment reasoning using large language models. Centered on Qwen2.5-14B-Instruction or Xunzi-Qwen-14B, we construct a 1,225-sample corpus of Tang and Song poems with parallel translations and multi-label sentiment annotations (e.g., nostalgia, patriotism, contemplation).The task is divided into comprehension, translation, and sentiment inference, each guided by dynamic prompting and task-specific templates. We employ mixed supervised fine-tuning to better capture syntactic and metaphorical patterns. For sentiment reasoning, we apply proximal policy optimization (PPO) with a custom reward function, boosting accuracy from 0.771 to 0.807(p < 0.01). Our model achieves a 0.714 comprehensive score, outperforming single-task base-lines by 12.6%. Ablation studies further confirm the benefits of multi-task learning in promoting cross-task knowledge transfer.Keywords: Classical Chinese Poetry, Multi-Task Fine-Tuning, Data Augmentation, ProximalPolicy Optimization"
YNUzwt at SemEval-2025 Task 10: Tree-guided Stagewise Classifier for Entity Framing and Narrative Classification
Qiangyu Tan | Yuhang Cui | Zhiwen Tang
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Qiangyu Tan | Yuhang Cui | Zhiwen Tang
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
This paper presents a hierarchical classification framework, designated as the Tree-guided Stagewise Classifier (TGSC) , which implements a Chain-of-Thought (CoT) reasoning paradigm for addressing multi-label and multi-class classification challenges in multilingual news article analysis in SemEval-2025 Task 10. The proposed methodology leverages the zero-shot capabilities inherent in Large Language Models (LLMs) through a systematic hierarchical reasoning mechanism. This process proceeds through successive hierarchical levels, wherein the classification commences from root nodes and progressively navigates category branches via iterative determinations at each hierarchical tier, ultimately culminating in leaf category identification during the final classification stage. To optimize classification precision, a specialized prompt engineering strategy incorporating hierarchical structural constraints is developed to guide the reasoning trajectory. Experimental results demonstrate the effectiveness of our approach, achieving competitive performance across multiple languages in Subtask 1 and Subtask 2.
System Report for CCL25-Eval Task 8: ClinSplitFT: Enhancing ICD Coding in Chinese EMRs with Prompt Engineering and Candidate Set Splitting
Pusheng Chen | Qiangyu Tan | Zhiwen Tang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Pusheng Chen | Qiangyu Tan | Zhiwen Tang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"CCL25-Eval Task 8 focuses on ICD coding from clinical narratives. The challenge of this task lies in the imbalanced and complex label space, with primary diagnoses having a small, focused set of labels and secondary diagnoses involving a much larger, intricate set. To address these challenges, we propose ClinSplitFT (Clinical Code Split Fine-Tuning), a novel framework that enhances ICD coding accuracy using large language models (LLMs). The key innovation of ClinSplitFT is its candidate set split strategy, which splits the full candidate set into several manageable subsets and fine-tunes the model separately on each. During inference, predictions from all subsets are aggregated to produce the final output. This split-based fine-tuning approach enables more focused learning and better generalization in multi-label settings, making it an effective solution for clinical code prediction at scale. Experimental results show significant improvements in ICD coding performance. The code for our system is publicly available at https://github.com/277CPS/ICD-Code-prediction."
System Report for CCL25-Eval Task 6: Enhancing Chinese Essay Rhetoric Recognition through Targeted Data Augmentation and Model Ensemble Voting
Jingjun Tang | Zhiwen Tang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Jingjun Tang | Zhiwen Tang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"This paper presents our approach to the Second Chinese Essay Rhetoric Identification and Understanding Competition, which focuses on analyzing rhetorical features in essays written by primary and secondary school students. The competition includes three tasks: multi-label classification of rhetorical forms, divided into 9 coarse-grained and 19 fine-grained categories; multi-label classification of rhetorical content, comprising 5 coarse-grained and 11 fine-grained categories specific to certain rhetorical types; and extraction of rhetorical components, including connectives, descriptive objects, and specific rhetorical content. To address the challenge of limited training data, we applied targeted data augmentation and manual corrections to build a high-quality dataset. We then fine-tuned large language models using one-shot and in-context learning. Finally, we employed an ensemble strategy that integrates model predictions through a voting mechanism. Our system achieved a score of 52.78 and ranked third in the competition."
System Report for CCL25-Eval Task 9: Leveraging Chain-of-Thought and Multi-task Learning for Optimized Traditional Chinese Medicine Diagnosis and Treatment
张坚 张坚 | Wei Zhu | Zhiwen Tang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
张坚 张坚 | Wei Zhu | Zhiwen Tang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"This paper introduces an intelligent diagnostic system for Traditional Chinese Medicine (TCM) that emulates clinical reasoning through a phased multi-turn dialogue process. The system architecture is divided into three sequential stages: syndrome differentiation, disease diagnosis,and prescription generation. Each stage leverages Chain-of-Thought (CoT) techniques to ensure coherent reasoning, maintaining contextual continuity and consistency throughout the diagnostic process. To optimize model performance, we employ a multi-task fine-tuning approach, combin-ing data from all three stages for training the Qwen2.5-7B-Instruct model. Experimental results show that the system achieves strong performance across all diagnostic tasks. Error analysis re-veals that the accuracy of the first two stages, syndrome differentiation and disease diagnosis, has a significant impact on the quality of the generated prescriptions. This work provides a scalable framework for intelligent TCM diagnosis, advancing both medical knowledge reasoning and the application of domain-specific large language models."
YNU at SemEval-2025 Task 4: Synthetic Token Alternative Training for LLM Unlearning
Yang Chen | Zheyang Luo | Zhiwen Tang
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Yang Chen | Zheyang Luo | Zhiwen Tang
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
This paper describes our system submitted to SemEval-2025 Task 4, which introduces the Synthetic Token Alternative Training (STAT) algorithm for efficient unlearning in large language models (LLMs). The proposed method aims to enable pretrained models to selectively forget designated data (the forget set) while preserving performance on the remaining data (the retain set).The STAT framework adopts a dual-stage process. In the first stage, pseudo tokens are generated through random sampling and applied to the forget set, facilitating more effective targeted unlearning. In the second stage, the model undergoes gradient-based optimization using an alternative training scheme that alternates between pseudo-token-augmented samples from the forget set and unmodified samples from the retain set. This design promotes stable unlearning of the specified data while accelerating convergence and preserving the model’s general performance.Our system achieved 3rd place in the 7B model track (OLMo-7B) and 7th place in the 1B model track (OLMo-1B), demonstrating substantial improvements over the official baselines, exhibiting superior stability in knowledge retention and more effective targeted forgetting compared to existing approaches.
2024
Zero-Shot Cross-Domain Dialogue State Tracking via Dual Low-Rank Adaptation
Xiang Luo | Zhiwen Tang | Jin Wang | Xuejie Zhang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xiang Luo | Zhiwen Tang | Jin Wang | Xuejie Zhang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zero-shot dialogue state tracking (DST) seeks to enable dialogue systems to transition to unfamiliar domains without manual annotation or extensive retraining. Prior research has approached this objective by embedding prompts into language models (LMs). Common methodologies include integrating prompts at the input layer or introducing learnable variables at each transformer layer. Nonetheless, each strategy exhibits inherent limitations. Prompts integrated at the input layer risk underutilization, with their impact potentially diminishing across successive transformer layers. Conversely, the addition of learnable variables to each layer can complicate the training process and increase inference latency. To tackle the issues mentioned above, this paper proposes Dual Low-Rank Adaptation (DualLoRA), a plug-and-play architecture designed for zero-shot DST. DualLoRA incorporates two distinct Low-Rank Adaptation (LoRA) components, targeting both dialogue context processing and prompt optimization, to ensure the comprehensive influence of prompts throughout the transformer model layers. This is achieved without incurring additional inference latency, showcasing an efficient integration into existing architectures. Through rigorous evaluation on the MultiWOZ and SGD datasets, DualLoRA demonstrates notable improvements across multiple domains, outperforming traditional baseline methods in zero-shot settings.
DuetSim: Building User Simulator with Dual Large Language Models for Task-Oriented Dialogues
Xiang Luo | Zhiwen Tang | Jin Wang | Xuejie Zhang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Xiang Luo | Zhiwen Tang | Jin Wang | Xuejie Zhang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
User Simulators play a pivotal role in training and evaluating task-oriented dialogue systems. Traditional user simulators typically rely on human-engineered agendas, resulting in generated responses that often lack diversity and spontaneity. Although large language models (LLMs) exhibit a remarkable capacity for generating coherent and contextually appropriate utterances, they may fall short when tasked with generating responses that effectively guide users towards their goals, particularly in dialogues with intricate constraints and requirements. This paper introduces DuetSim, a novel framework designed to address the intricate demands of task-oriented dialogues by leveraging LLMs. DuetSim stands apart from conventional approaches by employing two LLMs in tandem: one dedicated to response generation and the other focused on verification. This dual LLM approach empowers DuetSim to produce responses that not only exhibit diversity but also demonstrate accuracy and are preferred by human users. We validate the efficacy of our method through extensive experiments conducted on the MultiWOZ dataset, highlighting improvements in response quality and correctness, largely attributed to the incorporation of the second LLM.