Shaina Ashraf

2025

MultiProp Framework: Ensemble Models for Enhanced Cross-Lingual Propaganda Detection in Social Media and News using Data Augmentation, Text Segmentation, and Meta-Learning
Farizeh Aldabbas | Shaina Ashraf | Rafet Sifa | Lucie Flek
Proceedings of the 1st Workshop on NLP for Languages Using Arabic Script

Propaganda, a pervasive tool for influenc- ing public opinion, demands robust auto- mated detection systems, particularly for under- resourced languages. Current efforts largely focus on well-resourced languages like English, leaving significant gaps in languages such as Arabic. This research addresses these gaps by introducing MultiProp Framework, a cross- lingual meta-learning framework designed to enhance propaganda detection across multiple languages, including Arabic, German, Italian, French and English. We constructed a mul- tilingual dataset using data translation tech- niques, beginning with Arabic data from PTC and WANLP shared tasks, and expanded it with translations into German Italian and French, further enriched by the SemEval23 dataset. Our proposed framework encompasses three distinct models: MultiProp-Baseline, which combines ensembles of pre-trained models such as GPT-2, mBART, and XLM-RoBERTa; MultiProp-ML, designed to handle languages with minimal or no training data by utiliz- ing advanced meta-learning techniques; and MultiProp-Chunk, which overcomes the chal- lenges of processing longer texts that exceed the token limits of pre-trained models. To- gether, they deliver superior performance com- pared to state-of-the-art methods, representing a significant advancement in the field of cross- lingual propaganda detection.

pdf bib abs

CAISA at SemEval-2025 Task 7: Multilingual and Cross-lingual Fact-Checked Claim Retrieval
Muqaddas Haroon | Shaina Ashraf | Ipek Baris | Lucie Flek
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

We leveraged LLaMA, utilizing its ability to evaluate the relevance of retrieved claims within a retrieval-based fact-checking framework. This approach aimed to explore the impact of large language models (LLMs) on retrieval tasks and assess their effectiveness in enhancing fact-checking accuracy. Additionally, we integrated Jina embeddings v2 and the MPNet multilingual sentence transformer to filter and rank a set of 500 candidate claims. These refined claims were then used as input for LLaMA, ensuring that only the most contextually relevant ones were assessed.

2024

pdf bib abs

DeFaktS: A German Dataset for Fine-Grained Disinformation Detection through Social Media Framing
Shaina Ashraf | Isabel Bezzaoui | Ionut Andone | Alexander Markowetz | Jonas Fegert | Lucie Flek
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In today’s rapidly evolving digital age, disinformation poses a significant threat to public sentiment and socio-political dynamics. To address this, we introduce a new dataset “DeFaktS”, designed to understand and counter disinformation within German media. Distinctively curated across various news topics, DeFaktS offers an unparalleled insight into the diverse facets of disinformation. Our dataset, containing 105,855 posts with 20,008 meticulously labeled tweets, serves as a rich platform for in-depth exploration of disinformation’s diverse characteristics. A key attribute that sets DeFaktS apart is, its fine-grain annotations based on polarized categories. Our annotation framework, grounded in the textual characteristics of news content, eliminates the need for external knowledge sources. Unlike most existing corpora that typically assign a singular global veracity value to news, our methodology seeks to annotate every structural component and semantic element of a news piece, ensuring a comprehensive and detailed understanding. In our experiments, we employed a mix of classical machine learning and advanced transformer-based models. The results underscored the potential of DeFaktS, with transformer models, especially the German variant of BERT, exhibiting pronounced effectiveness in both binary and fine-grained classifications.

pdf bib abs

Harnessing Personalization Methods to Identify and Predict Unreliable Information Spreader Behavior
Shaina Ashraf | Fabio Gruschka | Lucie Flek | Charles Welch
Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024)

Studies on detecting and understanding the spread of unreliable news on social media have identified key characteristic differences between reliable and unreliable posts. These differences in language use also vary in expression across individuals, making it important to consider personal factors in unreliable news detection. The application of personalization methods for this has been made possible by recent publication of datasets with user histories, though this area is still largely unexplored. In this paper we present approaches to represent social media users in order to improve performance on three tasks: (1) classification of unreliable news posts, (2) classification of unreliable news spreaders, and, (3) prediction of the spread of unreliable news. We compare the User2Vec method from previous work to two other approaches; a learnable user embedding layer trained with the downstream task, and a representation derived from an authorship attribution classifier. We demonstrate that the implemented strategies substantially improve classification performance over state-of-the-art and provide initial results on the task of unreliable news prediction.

Co-authors

Jonas Fegert 1

Fabio Gruschka 1

Muqaddas Haroon 1

Alexander Markowetz 1

Rafet Sifa 1

Charles Welch 1

Venues

WOAH1

Fix author