Xia Lun
2025
System Report for CCL25-Eval Task 4: Prompting, Scheduling, and Arbitration Strategies for Chinese Factivity Inference
Liu Daohuan | Xia Lun | Yuxuan Zhang | Xinyu Yang | Fanzhen Kong
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Liu Daohuan | Xia Lun | Yuxuan Zhang | Xinyu Yang | Fanzhen Kong
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
This report presents the methodology and findings of prompting large language models (LLMs) for Chinese Factivity Inference (FI). We evaluated five LLMs, among which DeepSeek-R1 demonstrated the best overall performance. A combination of Chain-of-Thought (CoT), few-shot, and system-level instructions were combined for final prompting. Additionally, we introduced a pairwise task scheduling strategy and a multi-agent disagreement arbitration mechanism to further enhance inference quality. Experimental results show that the integration of prompting, scheduling, and arbitration strategies significantly improves performance, with DeepSeek-R1 achieving 91.7% overall accuracy on the evaluation set. The report also highlights findings regarding LLM behavior on FI tasks and outlines potential directions for future improvement.