Do We Need Language-Specific Fact-Checking Models? The Case of Chinese

Caiqi Zhang, Zhijiang Guo, Andreas Vlachos


Abstract
This paper investigates the potential benefits of language-specific fact-checking models, focusing on the case of Chinese using CHEF dataset. To better reflect real-world fact-checking, we first develop a novel Chinese document-level evidence retriever, achieving state-of-the-art performance. We then demonstrate the limitations of translation-based methods and multilingual language models, highlighting the need for language-specific systems. To better analyze token-level biases in different systems, we construct an adversarial dataset based on the CHEF dataset, where each instance has a large word overlap with the original one but holds the opposite veracity label. Experimental results on the CHEF dataset and our adversarial dataset show that our proposed method outperforms translation-based methods and multilingual language models and is more robust toward biases, emphasizing the importance of language-specific fact-checking systems.
Anthology ID:
2024.emnlp-main.113
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1899–1914
Language:
URL:
https://aclanthology.org/2024.emnlp-main.113
DOI:
Bibkey:
Cite (ACL):
Caiqi Zhang, Zhijiang Guo, and Andreas Vlachos. 2024. Do We Need Language-Specific Fact-Checking Models? The Case of Chinese. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 1899–1914, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Do We Need Language-Specific Fact-Checking Models? The Case of Chinese (Zhang et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.113.pdf