Qiangyu Tan
2025
System Report for CCL25-Eval Task 8: ClinSplitFT: Enhancing ICD Coding in Chinese EMRs with Prompt Engineering and Candidate Set Splitting
Pusheng Chen | Qiangyu Tan | Zhiwen Tang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Pusheng Chen | Qiangyu Tan | Zhiwen Tang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"CCL25-Eval Task 8 focuses on ICD coding from clinical narratives. The challenge of this task lies in the imbalanced and complex label space, with primary diagnoses having a small, focused set of labels and secondary diagnoses involving a much larger, intricate set. To address these challenges, we propose ClinSplitFT (Clinical Code Split Fine-Tuning), a novel framework that enhances ICD coding accuracy using large language models (LLMs). The key innovation of ClinSplitFT is its candidate set split strategy, which splits the full candidate set into several manageable subsets and fine-tunes the model separately on each. During inference, predictions from all subsets are aggregated to produce the final output. This split-based fine-tuning approach enables more focused learning and better generalization in multi-label settings, making it an effective solution for clinical code prediction at scale. Experimental results show significant improvements in ICD coding performance. The code for our system is publicly available at https://github.com/277CPS/ICD-Code-prediction."
YNUzwt at SemEval-2025 Task 10: Tree-guided Stagewise Classifier for Entity Framing and Narrative Classification
Qiangyu Tan | Yuhang Cui | Zhiwen Tang
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Qiangyu Tan | Yuhang Cui | Zhiwen Tang
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
This paper presents a hierarchical classification framework, designated as the Tree-guided Stagewise Classifier (TGSC) , which implements a Chain-of-Thought (CoT) reasoning paradigm for addressing multi-label and multi-class classification challenges in multilingual news article analysis in SemEval-2025 Task 10. The proposed methodology leverages the zero-shot capabilities inherent in Large Language Models (LLMs) through a systematic hierarchical reasoning mechanism. This process proceeds through successive hierarchical levels, wherein the classification commences from root nodes and progressively navigates category branches via iterative determinations at each hierarchical tier, ultimately culminating in leaf category identification during the final classification stage. To optimize classification precision, a specialized prompt engineering strategy incorporating hierarchical structural constraints is developed to guide the reasoning trajectory. Experimental results demonstrate the effectiveness of our approach, achieving competitive performance across multiple languages in Subtask 1 and Subtask 2.