Mohsen Mahmoodzadeh
2024
DRAGON at FIGNEWS 2024 Shared Task: a Dedicated RAG for October 7th conflict News
Sadegh Jafari
|
Mohsen Mahmoodzadeh
|
Vanooshe Nazari
|
Razieh Bahmanyar
|
Kathryn Burrows
Proceedings of The Second Arabic Natural Language Processing Conference
In this study, we present a novel approach to annotating bias and propaganda in social media data by leveraging topic modeling techniques. Utilizing the BERTopic tool, we performed topic modeling on the FIGNEWS Shared-task dataset, which initially comprised 13,500 samples. From this dataset, we identified 35 distinct topics and selected approximately 50 representative samples from each topic, resulting in a subset of 1,812 samples. These selected samples were meticulously annotated for bias and propaganda labels. Subsequently, we employed multiple methods like KNN, SVC, XGBoost, and RAG to develop a classifier capable of detecting bias and propaganda within social media content. Our approach demonstrates the efficacy of using topic modeling for efficient data subset selection and provides a robust foundation for improving the accuracy of bias and propaganda detection in large-scale social media datasets.
Search