Yuxuan Zhang
2025
System Report for CCL25-Eval Task 4: Prompting, Scheduling, and Arbitration Strategies for Chinese Factivity Inference
Liu Daohuan | Xia Lun | Yuxuan Zhang | Xinyu Yang | Fanzhen Kong
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Liu Daohuan | Xia Lun | Yuxuan Zhang | Xinyu Yang | Fanzhen Kong
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
This report presents the methodology and findings of prompting large language models (LLMs) for Chinese Factivity Inference (FI). We evaluated five LLMs, among which DeepSeek-R1 demonstrated the best overall performance. A combination of Chain-of-Thought (CoT), few-shot, and system-level instructions were combined for final prompting. Additionally, we introduced a pairwise task scheduling strategy and a multi-agent disagreement arbitration mechanism to further enhance inference quality. Experimental results show that the integration of prompting, scheduling, and arbitration strategies significantly improves performance, with DeepSeek-R1 achieving 91.7% overall accuracy on the evaluation set. The report also highlights findings regarding LLM behavior on FI tasks and outlines potential directions for future improvement.
Synergizing Multimodal Temporal Knowledge Graphs and Large Language Models for Social Relation Recognition
Haorui Wang | Zheng Wang | Yuxuan Zhang | Bo Wang | Bin Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Haorui Wang | Zheng Wang | Yuxuan Zhang | Bo Wang | Bin Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Recent years have witnessed remarkable advances in Large Language Models (LLMs). However, in the task of social relation recognition, Large Language Models (LLMs) encounter significant challenges due to their reliance on sequential training data, which inherently restricts their capacity to effectively model complex graph-structured relationships. To address this limitation, we propose a novel low-coupling method synergizing multimodal temporal Knowledge Graphs and Large Language Models (mtKG-LLM) for social relation reasoning. Specifically, we extract multimodal information from the videos and model the social networks as spatial Knowledge Graphs (KGs) for each scene. Temporal KGs are constructed based on spatial KGs and updated along the timeline for long-term reasoning. Subsequently, we retrieve multi-scale information from the graph-structured knowledge for LLMs to recognize the underlying social relation. Extensive experiments demonstrate that our method has achieved state-of-the-art performance in social relation recognition. Furthermore, our framework exhibits effectiveness in bridging the gap between KGs and LLMs. Our code will be released after acceptance.
Interesting Culture: Social Relation Recognition from Videos via Culture De-confounding
Yuxuan Zhang | Yangfu Zhu | Haorui Wang | Bin Wu
Findings of the Association for Computational Linguistics: EMNLP 2025
Yuxuan Zhang | Yangfu Zhu | Haorui Wang | Bin Wu
Findings of the Association for Computational Linguistics: EMNLP 2025
Social relationship recognition, as one of the fundamental tasks in video understanding, contributes to the construction and application of multi-modal knowledge graph. Previous works have mainly focused on two aspects: generating character graphs and multi-modal fusion. However, they often overlook the impact of cultural differences on relationship recognition. Specifically, relationship recognition models are susceptible to being misled by training data from a specific cultural context. This can result in the learning of culture-specific spurious correlations, ultimately restricting the ability to recognize social relationships in different cultures. Therefore, we employ a customized causal graph to analyze the confounding effects of culture in the relationship recognition task. We propose a Cultural Causal Intervention (CCI) model that mitigates the influence of culture as a confounding factor in the visual and textual modalities. Importantly, we also construct a novel video social relation recognition (CVSR) dataset to facilitate discussion and research on cultural factors in video tasks. Extensive experiments conducted on several datasets demonstrate that the proposed model surpasses state-of-the-art methods.