Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets

Yida Mu; Xingyi Song; Kalina Bontcheva; Nikolaos Aletras

Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets

Yida Mu, Xingyi Song, Kalina Bontcheva, Nikolaos Aletras

Abstract

A crucial aspect of a rumor detection model is its ability to generalize, particularly its ability to detect emerging, previously unknown rumors. Past research has indicated that content-based (i.e., using solely source post as input) rumor detection models tend to perform less effectively on unseen rumors. At the same time, the potential of context-based models remains largely untapped. The main contribution of this paper is in the in-depth evaluation of the performance gap between content and context-based models specifically on detecting new, unseen rumors. Our empirical findings demonstrate that context-based models are still overly dependent on the information derived from the rumors’ source post and tend to overlook the significant role that contextual information can play. We also study the effect of data split strategies on classifier performance. Based on our experimental results, the paper also offers practical suggestions on how to minimize the effects of temporal concept drift in static datasets during the training of rumor detection methods.

Anthology ID:: 2024.lrec-main.595
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 6739–6751
Language:
URL:: https://aclanthology.org/2024.lrec-main.595/
DOI:
Bibkey:
Cite (ACL):: Yida Mu, Xingyi Song, Kalina Bontcheva, and Nikolaos Aletras. 2024. Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 6739–6751, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets (Mu et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.595.pdf

PDF Cite Search Fix data