CHIFRAUD: A Long-term Web Text Dataset for Chinese Fraud Detection

Min Tang, Lixin Zou, Zhe Jin, ShuJie Cui, Shiuan Ni Liang, Weiqing Wang


Abstract
Detecting fraudulent online text is essential, as these manipulative messages exploit human greed, deceive individuals, and endanger societal security. Currently, this task remains under-explored on the Chinese web due to the lack of a comprehensive dataset of Chinese fraudulent texts. However, creating such a dataset is challenging because it requires extensive annotation within a vast collection of normal texts. Additionally, the creators of fraudulent webpages continuously update their tactics to evade detection by downstream platforms and promote fraudulent messages. To this end, this work firstly presents the comprehensive long-term dataset of Chinese fraudulent texts collected over 12 months, consisting of 59,106 entries extracted from billions of web pages. Furthermore, we design and provide a wide range of baselines, including large language model-based detectors, and pre-trained language model approaches. The necessary dataset and benchmark codes for further research are available via https://github. com/xuemingxxx/ChiFraud.
Anthology ID:
2025.coling-main.398
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5962–5974
Language:
URL:
https://aclanthology.org/2025.coling-main.398/
DOI:
Bibkey:
Cite (ACL):
Min Tang, Lixin Zou, Zhe Jin, ShuJie Cui, Shiuan Ni Liang, and Weiqing Wang. 2025. CHIFRAUD: A Long-term Web Text Dataset for Chinese Fraud Detection. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5962–5974, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
CHIFRAUD: A Long-term Web Text Dataset for Chinese Fraud Detection (Tang et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.398.pdf