Nedjma Ousidhoum


pdf bib
The Intended Uses of Automated Fact-Checking Artefacts: Why, How and Who
Michael Schlichtkrull | Nedjma Ousidhoum | Andreas Vlachos
Findings of the Association for Computational Linguistics: EMNLP 2023

Automated fact-checking is often presented as an epistemic tool that fact-checkers, social media consumers, and other stakeholders can use to fight misinformation. Nevertheless, few papers thoroughly discuss how. We document this by analysing 100 highly-cited papers, and annotating epistemic elements related to intended use, i.e., means, ends, and stakeholders. We find that narratives leaving out some of these aspects are common, that many papers propose inconsistent means and ends, and that the feasibility of suggested strategies rarely has empirical backing. We argue that this vagueness actively hinders the technology from reaching its goals, as it encourages overclaiming, limits criticism, and prevents stakeholder feedback. Accordingly, we provide several recommendations for thinking and writing about the use of fact-checking artefacts.

pdf bib
SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)
Shamsuddeen Hassan Muhammad | Idris Abdulmumin | Seid Muhie Yimam | David Ifeoluwa Adelani | Ibrahim Said Ahmad | Nedjma Ousidhoum | Abinew Ali Ayele | Saif Mohammad | Meriem Beloucif | Sebastian Ruder
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

We present the first Africentric SemEval Shared task, Sentiment Analysis for African Languages (AfriSenti-SemEval) - The dataset is available at AfriSenti-SemEval is a sentiment classification challenge in 14 African languages: Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yorb (Muhammad et al., 2023), using data labeled with 3 sentiment classes. We present three subtasks: (1) Task A: monolingual classification, which received 44 submissions; (2) Task B: multilingual classification, which received 32 submissions; and (3) Task C: zero-shot classification, which received 34 submissions. The best performance for tasks A and B was achieved by NLNDE team with 71.31 and 75.06 weighted F1, respectively. UCAS-IIE-NLP achieved the best average score for task C with 58.15 weighted F1. We describe the various approaches adopted by the top 10 systems and their approaches.

pdf bib
AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages
Shamsuddeen Muhammad | Idris Abdulmumin | Abinew Ayele | Nedjma Ousidhoum | David Adelani | Seid Yimam | Ibrahim Ahmad | Meriem Beloucif | Saif Mohammad | Sebastian Ruder | Oumaima Hourrane | Alipio Jorge | Pavel Brazdil | Felermino Ali | Davis David | Salomey Osei | Bello Shehu-Bello | Falalu Lawan | Tajuddeen Gwadabe | Samuel Rutunda | Tadesse Belay | Wendimu Messelle | Hailu Balcha | Sisay Chala | Hagos Gebremichael | Bernard Opoku | Stephen Arthur
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Africa is home to over 2,000 languages from over six language families and has the highest linguistic diversity among all continents. This includes 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial in enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti, a sentiment analysis benchmark that contains a total of >110,000 tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yoruba) from four language families. The tweets were annotated by native speakers and used in the AfriSenti-SemEval shared task (with over 200 participants, see website: We describe the data collection methodology, annotation process, and the challenges we dealt with when curating each dataset. We further report baseline experiments conducted on the AfriSenti datasets and discuss their usefulness.


pdf bib
Varifocal Question Generation for Fact-checking
Nedjma Ousidhoum | Zhangdie Yuan | Andreas Vlachos
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Fact-checking requires retrieving evidence related to a claim under investigation. The task can be formulated as question generation based on a claim, followed by question answering.However, recent question generation approaches assume that the answer is known and typically contained in a passage given as input,whereas such passages are what is being sought when verifying a claim.In this paper, we present Varifocal, a method that generates questions based on different focal points within a given claim, i.e. different spans of the claim and its metadata, such as its source and date.Our method outperforms previous work on a fact-checking question generation dataset on a wide range of automatic evaluation metrics.These results are corroborated by our manual evaluation, which indicates that our method generates more relevant and informative questions.We further demonstrate the potential of focal points in generating sets of clarification questions for product descriptions.


pdf bib
Probing Toxic Content in Large Pre-Trained Language Models
Nedjma Ousidhoum | Xinran Zhao | Tianqing Fang | Yangqiu Song | Dit-Yan Yeung
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Large pre-trained language models (PTLMs) have been shown to carry biases towards different social groups which leads to the reproduction of stereotypical and toxic content by major NLP systems. We propose a method based on logistic regression classifiers to probe English, French, and Arabic PTLMs and quantify the potentially harmful content that they convey with respect to a set of templates. The templates are prompted by a name of a social group followed by a cause-effect relation. We use PTLMs to predict masked tokens at the end of a sentence in order to examine how likely they enable toxicity towards specific communities. We shed the light on how such negative content can be triggered within unrelated and benign contexts based on evidence from a large-scale study, then we explain how to take advantage of our methodology to assess and mitigate the toxicity transmitted by PTLMs.


pdf bib
Comparative Evaluation of Label-Agnostic Selection Bias in Multilingual Hate Speech Datasets
Nedjma Ousidhoum | Yangqiu Song | Dit-Yan Yeung
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Work on bias in hate speech typically aims to improve classification performance while relatively overlooking the quality of the data. We examine selection bias in hate speech in a language and label independent fashion. We first use topic models to discover latent semantics in eleven hate speech corpora, then, we present two bias evaluation metrics based on the semantic similarity between topics and search words frequently used to build corpora. We discuss the possibility of revising the data collection process by comparing datasets and analyzing contrastive case studies.


pdf bib
Multilingual and Multi-Aspect Hate Speech Analysis
Nedjma Ousidhoum | Zizheng Lin | Hongming Zhang | Yangqiu Song | Dit-Yan Yeung
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Current research on hate speech analysis is typically oriented towards monolingual and single classification tasks. In this paper, we present a new multilingual multi-aspect hate speech analysis dataset and use it to test the current state-of-the-art multilingual multitask learning approaches. We evaluate our dataset in various classification settings, then we discuss how to leverage our annotations in order to improve hate speech detection and classification in general.