2025
pdf
bib
abs
Evaluating the Capabilities of Large Language Models for Multi-label Emotion Understanding
Tadesse Destaw Belay
|
Israel Abebe Azime
|
Abinew Ali Ayele
|
Grigori Sidorov
|
Dietrich Klakow
|
Philip Slusallek
|
Olga Kolesnikova
|
Seid Muhie Yimam
Proceedings of the 31st International Conference on Computational Linguistics
Large Language Models (LLMs) show promising learning and reasoning abilities. Compared to other NLP tasks, multilingual and multi-label emotion evaluation tasks are under-explored in LLMs. In this paper, we present EthioEmo, a multi-label emotion classification dataset for four Ethiopian languages, namely, Amharic (amh), Afan Oromo (orm), Somali (som), and Tigrinya (tir). We perform extensive experiments with an additional English multi-label emotion dataset from SemEval 2018 Task 1. Our evaluation includes encoder-only, encoder-decoder, and decoder-only language models. We compare zero and few-shot approaches of LLMs to fine-tuning smaller language models. The results show that accurate multi-label emotion classification is still insufficient even for high-resource languages such as English, and there is a large gap between the performance of high-resource and low-resource languages. The results also show varying performance levels depending on the language and model type. EthioEmo is available publicly to further improve the understanding of emotions in language models and how people convey emotions through various languages.
pdf
bib
abs
Multilingual and Explainable Text Detoxification with Parallel Corpora
Daryna Dementieva
|
Nikolay Babakov
|
Amit Ronen
|
Abinew Ali Ayele
|
Naquee Rizwan
|
Florian Schneider
|
Xintong Wang
|
Seid Muhie Yimam
|
Daniil Moskovskiy
|
Elisei Stakovskii
|
Eran Kaufman
|
Ashraf Elnagar
|
Animesh Mukherjee
|
Alexander Panchenko
Proceedings of the 31st International Conference on Computational Linguistics
Even with various regulations in place across countries and social media platforms (Government of India, 2021; European Parliament and Council of the European Union, 2022), digital abusive speech remains a significant issue. One potential approach to address this challenge is automatic text detoxification, a text style transfer (TST) approach that transforms toxic language into a more neutral or non-toxic form. To date, the availability of parallel corpora for the text detoxification task (Logacheva et al., 2022; Atwell et al., 2022; Dementieva et al., 2024a) has proven to be crucial for state-of-the-art approaches. With this work, we extend parallel text detoxification corpus to new languages—German, Chinese, Arabic, Hindi, and Amharic—testing in the extensive multilingual setup TST baselines. Next, we conduct the first of its kind an automated, explainable analysis of the descriptive features of both toxic and non-toxic sentences, diving deeply into the nuances, similarities, and differences of toxicity and detoxification across 9 languages. Finally, based on the obtained insights, we experiment with a novel text detoxification method inspired by the Chain-of-Thoughts reasoning approach, enhancing the prompting process through clustering on relevant descriptive attributes.
pdf
bib
abs
AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
Shamsuddeen Hassan Muhammad
|
Idris Abdulmumin
|
Abinew Ali Ayele
|
David Ifeoluwa Adelani
|
Ibrahim Said Ahmad
|
Saminu Mohammad Aliyu
|
Paul Röttger
|
Abigail Oppong
|
Andiswa Bukula
|
Chiamaka Ijeoma Chukwuneke
|
Ebrahim Chekol Jibril
|
Elyas Abdi Ismail
|
Esubalew Alemneh
|
Hagos Tesfahun Gebremichael
|
Lukman Jibril Aliyu
|
Meriem Beloucif
|
Oumaima Hourrane
|
Rooweither Mabuya
|
Salomey Osei
|
Samuel Rutunda
|
Tadesse Destaw Belay
|
Tadesse Kebede Guge
|
Tesfa Tegegne Asfaw
|
Lilian Diana Awuor Wanzare
|
Nelson Odhiambo Onyango
|
Seid Muhie Yimam
|
Nedjma Ousidhoum
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Hate speech and abusive language are global phenomena that need socio-cultural background knowledge to be understood, identified, and moderated. However, in many regions of the Global South, there have been several documented occurrences of (1) absence of moderation and (2) censorship due to the reliance on keyword spotting out of context. Further, high-profile individuals have frequently been at the center of the moderation process, while large and targeted hate speech campaigns against minorities have been overlooked.These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes. To address this issue, we present AfriHate: a multilingual collection of hate speech and abusive language datasets in 15 African languages. Each instance in AfriHate is a tweet annotated by native speakers familiar with the regional culture. We report the challenges related to the construction of the datasets and present various classification baseline results with and without using LLMs. We find that model performance highly depends on the language and that multilingual models can help boost performance in low-resource settings.
pdf
bib
abs
SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection
Shamsuddeen Hassan Muhammad
|
Nedjma Ousidhoum
|
Idris Abdulmumin
|
Seid Muhie Yimam
|
Jan Philip Wahle
|
Terry Lima Ruas
|
Meriem Beloucif
|
Christine De Kock
|
Tadesse Destaw Belay
|
Ibrahim Said Ahmad
|
Nirmal Surange
|
Daniela Teodorescu
|
David Ifeoluwa Adelani
|
Alham Fikri Aji
|
Felermino Dario Mario Ali
|
Vladimir Araujo
|
Abinew Ali Ayele
|
Oana Ignat
|
Alexander Panchenko
|
Yi Zhou
|
Saif Mohammad
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
We present our shared task on text-based emotion detection, covering more than 30 languages from seven distinct language families. These languages are predominantly low-resource and spoken across various continents. The data instances are multi-labeled into six emotional classes, with additional datasets in 11 languages annotated for emotion intensity. Participants were asked to predict labels in three tracks: (a) emotion labels in monolingual settings, (b) emotion intensity scores, and (c) emotion labels in cross-lingual settings.
2024
pdf
bib
abs
EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation
Atnafu Lambebo Tonja
|
Israel Abebe Azime
|
Tadesse Destaw Belay
|
Mesay Gemeda Yigezu
|
Moges Ahmed Ah Mehamed
|
Abinew Ali Ayele
|
Ebrahim Chekol Jibril
|
Michael Melese Woldeyohannis
|
Olga Kolesnikova
|
Philipp Slusallek
|
Dietrich Klakow
|
Seid Muhie Yimam
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Large language models (LLMs) have gained popularity recently due to their outstanding performance in various downstream Natural Language Processing (NLP) tasks. However, low-resource languages are still lagging behind current state-of-the-art (SOTA) developments in the field of NLP due to insufficient resources to train LLMs. Ethiopian languages exhibit remarkable linguistic diversity, encompassing a wide array of scripts, and are imbued with profound religious and cultural significance. This paper introduces EthioLLM – multilingual large language models for five Ethiopian languages (Amharic, Ge’ez, Afan Oromo, Somali, and Tigrinya) and English, and Ethiobenchmark – a new benchmark dataset for various downstream NLP tasks. We evaluate the performance of these models across five downstream NLP tasks. We open-source our multilingual language models, new benchmark datasets for various downstream tasks, and task-specific fine-tuned language models and discuss the performance of the models. Our dataset and models are available at the https://huggingface.co/EthioNLP repository.
pdf
bib
abs
Detecting Hate Speech in Amharic Using Multimodal Analysis of Social Media Memes
Melese Ayichlie Jigar
|
Abinew Ali Ayele
|
Seid Muhie Yimam
|
Chris Biemann
Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024
In contemporary society, the proliferation of hate speech is increasingly prevalent across various social media platforms, with a notable trend of incorporating memes to amplify its visual impact and reach. The conventional text-based detection approaches frequently fail to address the complexities introduced by memes, thereby aggravating the challenges, particularly in low-resource languages such as Amharic. We develop Amharic meme hate speech detection models using 2,000 memes collected from Facebook, Twitter, and Telegram over four months. We employ native Amharic speakers to annotate each meme using a web-based tool, yielding a Fleiss’ kappa score of 0.50. We utilize different feature extraction techniques, namely VGG16 for images and word2Vec for textual content, and build unimodal and multimodal models such as LSTM, BiLSTM, and CNN. The BiLSTM model shows the best performance, achieving 63% accuracy for text and 75% for multimodal features. In image-only experiments, the CNN model achieves 69% in accuracy. Multimodal models demonstrate superior performance in detecting Amharic hate speech in memes, showcasing their potential to address the unique challenges posed by meme-based hate speech on social media.
pdf
bib
abs
Exploring Boundaries and Intensities in Offensive and Hate Speech: Unveiling the Complex Spectrum of Social Media Discourse
Abinew Ali Ayele
|
Esubalew Alemneh Jalew
|
Adem Chanie Ali
|
Seid Muhie Yimam
|
Chris Biemann
Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024
The prevalence of digital media and evolving sociopolitical dynamics have significantly amplified the dissemination of hateful content. Existing studies mainly focus on classifying texts into binary categories, often overlooking the continuous spectrum of offensiveness and hatefulness inherent in the text. In this research, we present an extensive benchmark dataset for Amharic, comprising 8,258 tweets annotated for three distinct tasks: category classification, identification of hate targets, and rating offensiveness and hatefulness intensities. Our study highlights that a considerable majority of tweets belong to the less offensive and less hate intensity levels, underscoring the need for early interventions by stakeholders. The prevalence of ethnic and political hatred targets, with significant overlaps in our dataset, emphasizes the complex relationships within Ethiopia’s sociopolitical landscape. We build classification and regression models and investigate the efficacy of models in handling these tasks. Our results reveal that hate and offensive speech can not be addressed by a simplistic binary classification, instead manifesting as variables across a continuous range of values. The afro-XLMR-large model exhibits the best performances achieving F1-scores of 75.30%, 70.59%, and 29.42% for the category, target, and regression tasks, respectively. The 80.22% correlation coefficient of the Afro-XLMR-large model indicates strong alignments.
2023
pdf
bib
abs
AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages
Shamsuddeen Hassan Muhammad
|
Idris Abdulmumin
|
Abinew Ali Ayele
|
Nedjma Ousidhoum
|
David Ifeoluwa Adelani
|
Seid Muhie Yimam
|
Ibrahim Sa'id Ahmad
|
Meriem Beloucif
|
Saif M. Mohammad
|
Sebastian Ruder
|
Oumaima Hourrane
|
Pavel Brazdil
|
Alipio Jorge
|
Felermino Dário Mário António Ali
|
Davis David
|
Salomey Osei
|
Bello Shehu Bello
|
Falalu Ibrahim
|
Tajuddeen Gwadabe
|
Samuel Rutunda
|
Tadesse Belay
|
Wendimu Baye Messelle
|
Hailu Beshada Balcha
|
Sisay Adugna Chala
|
Hagos Tesfahun Gebremichael
|
Bernard Opoku
|
Stephen Arthur
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Africa is home to over 2,000 languages from over six language families and has the highest linguistic diversity among all continents. This includes 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial in enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti, a sentiment analysis benchmark that contains a total of >110,000 tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yoruba) from four language families. The tweets were annotated by native speakers and used in the AfriSenti-SemEval shared task (with over 200 participants, see website: https://afrisenti-semeval.github.io). We describe the data collection methodology, annotation process, and the challenges we dealt with when curating each dataset. We further report baseline experiments conducted on the AfriSenti datasets and discuss their usefulness.
pdf
bib
abs
Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities
Atnafu Lambebo Tonja
|
Tadesse Destaw Belay
|
Israel Abebe Azime
|
Abinew Ali Ayele
|
Moges Ahmed Mehamed
|
Olga Kolesnikova
|
Seid Muhie Yimam
Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)
This survey delves into the current state of natural language processing (NLP) for four Ethiopian languages: Amharic, Afaan Oromo, Tigrinya, and Wolaytta. Through this paper, we identify key challenges and opportunities for NLP research in Ethiopia.Furthermore, we provide a centralized repository on GitHub that contains publicly available resources for various NLP tasks in these languages. This repository can be updated periodically with contributions from other researchers. Our objective is to disseminate information to NLP researchers interested in Ethiopian languages and encourage future research in this domain.
pdf
bib
abs
Multilingual Racial Hate Speech Detection Using Transfer Learning
Abinew Ali Ayele
|
Skadi Dinter
|
Seid Muhie Yimam
|
Chris Biemann
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
The rise of social media eases the spread of hateful content, especially racist content with severe consequences. In this paper, we analyze the tweets targeting the death of George Floyd in May 2020 as the event accelerated debates on racism globally. We focus on the tweets published in French for a period of one month since the death of Floyd. Using the Yandex Toloka platform, we annotate the tweets into categories as hate, offensive or normal. Tweets that are offensive or hateful are further annotated as racial or non-racial. We build French hate speech detection models based on the multilingual BERT and CamemBERT and apply transfer learning by fine-tuning the HateXplain model. We compare different approaches to resolve annotation ties and find that the detection model based on CamemBERT yields the best results in our experiments.
pdf
bib
abs
Exploring Amharic Hate Speech Data Collection and Classification Approaches
Abinew Ali Ayele
|
Seid Muhie Yimam
|
Tadesse Destaw Belay
|
Tesfa Asfaw
|
Chris Biemann
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
In this paper, we present a study of efficient data selection and annotation strategies for Amharic hate speech. We also build various classification models and investigate the challenges of hate speech data selection, annotation, and classification for the Amharic language. From a total of over 18 million tweets in our Twitter corpus, 15.1k tweets are annotated by two independent native speakers, and a Cohen’s kappa score of 0.48 is achieved. A third annotator, a curator, is also employed to decide on the final gold labels. We employ both classical machine learning and deep learning approaches, which include fine-tuning AmFLAIR and AmRoBERTa contextual embedding models. Among all the models, AmFLAIR achieves the best performance with an F1-score of 72%. We publicly release the annotation guidelines, keywords/lexicon entries, datasets, models, and associated scripts with a permissive license.
pdf
bib
abs
SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)
Shamsuddeen Hassan Muhammad
|
Idris Abdulmumin
|
Seid Muhie Yimam
|
David Ifeoluwa Adelani
|
Ibrahim Said Ahmad
|
Nedjma Ousidhoum
|
Abinew Ali Ayele
|
Saif Mohammad
|
Meriem Beloucif
|
Sebastian Ruder
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
We present the first Africentric SemEval Shared task, Sentiment Analysis for African Languages (AfriSenti-SemEval) - The dataset is available at 
https://github.com/afrisenti-semeval/afrisent-semeval-2023. AfriSenti-SemEval is a sentiment classification challenge in 14 African languages: Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yorb (Muhammad et al., 2023), using data labeled with 3 sentiment classes. We present three subtasks: (1) Task A: monolingual classification, which received 44 submissions; (2) Task B: multilingual classification, which received 32 submissions; and (3) Task C: zero-shot classification, which received 34 submissions. The best performance for tasks A and B was achieved by NLNDE team with 71.31 and 75.06 weighted F1, respectively. UCAS-IIE-NLP achieved the best average score for task C with 58.15 weighted F1. We describe the various approaches adopted by the top 10 systems and their approaches.