From Dataset to Detection: A Comprehensive Approach to Combating Malayalam Fake News

Devika K; Hariprasath .s.b; Haripriya B; Vigneshwar E; Premjith B.; Bharathi Raja Chakravarthi

From Dataset to Detection: A Comprehensive Approach to Combating Malayalam Fake News

Devika K, Hariprasath .s.b, Haripriya B, Vigneshwar E, Premjith B, Bharathi Raja Chakravarthi

Abstract

Identifying fake news hidden as real news is crucial to fight misinformation and ensure reliable information, especially in resource-scarce languages like Malayalam. To recognize the unique challenges of fake news in languages like Malayalam, we present a dataset curated specifically for classifying fake news in Malayalam. This fake news is categorized based on the degree of misinformation, marking the first of its kind in this language. Further, we propose baseline models employing multilingual BERT and diverse machine learning classifiers. Our findings indicate that logistic regression trained on LaBSE features demonstrates promising initial performance with an F1 score of 0.3393. However, addressing the significant data imbalance remains essential for further improvement in model accuracy.

Anthology ID:: 2024.dravidianlangtech-1.3
Volume:: Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Month:: March
Year:: 2024
Address:: St. Julian's, Malta
Editors:: Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Elizabeth Sherly, Rajeswari Nadarajan, Manikandan Ravikiran
Venues:: DravidianLangTech | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16–23
Language:
URL:: https://aclanthology.org/2024.dravidianlangtech-1.3
DOI:
Bibkey:
Cite (ACL):: Devika K, Hariprasath .s.b, Haripriya B, Vigneshwar E, Premjith B, and Bharathi Raja Chakravarthi. 2024. From Dataset to Detection: A Comprehensive Approach to Combating Malayalam Fake News. In Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 16–23, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):: From Dataset to Detection: A Comprehensive Approach to Combating Malayalam Fake News (K et al., DravidianLangTech-WS 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.dravidianlangtech-1.3.pdf
Video:: https://aclanthology.org/2024.dravidianlangtech-1.3.mp4

PDF Cite Search Video