Detecting Deception in Disinformation across Languages: The Role of Linguistic Markers

Alba Perez-Montero, Silvia Gargova, Elena Lloret, Paloma Moreda Pozo


Abstract
The unstoppable proliferation of news driven by the rise of digital media has intensified the challenge of news verification. Natural Language Processing (NLP) offers solutions, primarily through content and context analysis. Recognizing the vital role of linguistic analysis, this paper presents a multilingual study of linguistic markers for automated deceptive fake news detection across English, Spanish, and Bulgarian. We compiled datasets in these languages to extract and analyze both general and specific linguistic markers. We then performed feature selection using the SelectKBest algorithm, applying it to various classification models with different combinations of general and specific linguistic markers. The results show that Logistic Regression and Support Vector Machine classification models achieved F1-scores above 0.8 for English and Spanish. For Bulgarian, Random Forest yielded the best results with an F1-score of 0.73. While these markers demonstrate potential for transferability to other languages, results may vary due to inherent linguistic characteristics. This necessitates further experimentation, especially in low-resource languages like Bulgarian. These findings highlight the significant potential of our dataset and linguistic markers for multilingual deceptive news detection.
Anthology ID:
2025.ranlp-1.108
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
943–952
Language:
URL:
https://aclanthology.org/2025.ranlp-1.108/
DOI:
Bibkey:
Cite (ACL):
Alba Perez-Montero, Silvia Gargova, Elena Lloret, and Paloma Moreda Pozo. 2025. Detecting Deception in Disinformation across Languages: The Role of Linguistic Markers. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 943–952, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Detecting Deception in Disinformation across Languages: The Role of Linguistic Markers (Perez-Montero et al., RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-1.108.pdf