Lilian Diana Awuor Wanzare
2025
AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
Shamsuddeen Hassan Muhammad | Idris Abdulmumin | Abinew Ali Ayele | David Ifeoluwa Adelani | Ibrahim Said Ahmad | Saminu Mohammad Aliyu | Paul Röttger | Abigail Oppong | Andiswa Bukula | Chiamaka Ijeoma Chukwuneke | Ebrahim Chekol Jibril | Elyas Abdi Ismail | Esubalew Alemneh | Hagos Tesfahun Gebremichael | Lukman Jibril Aliyu | Meriem Beloucif | Oumaima Hourrane | Rooweither Mabuya | Salomey Osei | Samuel Rutunda | Tadesse Destaw Belay | Tadesse Kebede Guge | Tesfa Tegegne Asfaw | Lilian Diana Awuor Wanzare | Nelson Odhiambo Onyango | Seid Muhie Yimam | Nedjma Ousidhoum
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Shamsuddeen Hassan Muhammad | Idris Abdulmumin | Abinew Ali Ayele | David Ifeoluwa Adelani | Ibrahim Said Ahmad | Saminu Mohammad Aliyu | Paul Röttger | Abigail Oppong | Andiswa Bukula | Chiamaka Ijeoma Chukwuneke | Ebrahim Chekol Jibril | Elyas Abdi Ismail | Esubalew Alemneh | Hagos Tesfahun Gebremichael | Lukman Jibril Aliyu | Meriem Beloucif | Oumaima Hourrane | Rooweither Mabuya | Salomey Osei | Samuel Rutunda | Tadesse Destaw Belay | Tadesse Kebede Guge | Tesfa Tegegne Asfaw | Lilian Diana Awuor Wanzare | Nelson Odhiambo Onyango | Seid Muhie Yimam | Nedjma Ousidhoum
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Hate speech and abusive language are global phenomena that need socio-cultural background knowledge to be understood, identified, and moderated. However, in many regions of the Global South, there have been several documented occurrences of (1) absence of moderation and (2) censorship due to the reliance on keyword spotting out of context. Further, high-profile individuals have frequently been at the center of the moderation process, while large and targeted hate speech campaigns against minorities have been overlooked.These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes. To address this issue, we present AfriHate: a multilingual collection of hate speech and abusive language datasets in 15 African languages. Each instance in AfriHate is a tweet annotated by native speakers familiar with the regional culture. We report the challenges related to the construction of the datasets and present various classification baseline results with and without using LLMs. We find that model performance highly depends on the language and that multilingual models can help boost performance in low-resource settings.
BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages
Shamsuddeen Hassan Muhammad | Nedjma Ousidhoum | Idris Abdulmumin | Jan Philip Wahle | Terry Ruas | Meriem Beloucif | Christine de Kock | Nirmal Surange | Daniela Teodorescu | Ibrahim Said Ahmad | David Ifeoluwa Adelani | Alham Fikri Aji | Felermino D. M. A. Ali | Ilseyar Alimova | Vladimir Araujo | Nikolay Babakov | Naomi Baes | Ana-Maria Bucur | Andiswa Bukula | Guanqun Cao | Rodrigo Tufiño | Rendi Chevi | Chiamaka Ijeoma Chukwuneke | Alexandra Ciobotaru | Daryna Dementieva | Murja Sani Gadanya | Robert Geislinger | Bela Gipp | Oumaima Hourrane | Oana Ignat | Falalu Ibrahim Lawan | Rooweither Mabuya | Rahmad Mahendra | Vukosi Marivate | Alexander Panchenko | Andrew Piper | Charles Henrique Porto Ferreira | Vitaly Protasov | Samuel Rutunda | Manish Shrivastava | Aura Cristina Udrea | Lilian Diana Awuor Wanzare | Sophie Wu | Florian Valentin Wunderlich | Hanif Muhammad Zhafran | Tianhui Zhang | Yi Zhou | Saif M. Mohammad
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shamsuddeen Hassan Muhammad | Nedjma Ousidhoum | Idris Abdulmumin | Jan Philip Wahle | Terry Ruas | Meriem Beloucif | Christine de Kock | Nirmal Surange | Daniela Teodorescu | Ibrahim Said Ahmad | David Ifeoluwa Adelani | Alham Fikri Aji | Felermino D. M. A. Ali | Ilseyar Alimova | Vladimir Araujo | Nikolay Babakov | Naomi Baes | Ana-Maria Bucur | Andiswa Bukula | Guanqun Cao | Rodrigo Tufiño | Rendi Chevi | Chiamaka Ijeoma Chukwuneke | Alexandra Ciobotaru | Daryna Dementieva | Murja Sani Gadanya | Robert Geislinger | Bela Gipp | Oumaima Hourrane | Oana Ignat | Falalu Ibrahim Lawan | Rooweither Mabuya | Rahmad Mahendra | Vukosi Marivate | Alexander Panchenko | Andrew Piper | Charles Henrique Porto Ferreira | Vitaly Protasov | Samuel Rutunda | Manish Shrivastava | Aura Cristina Udrea | Lilian Diana Awuor Wanzare | Sophie Wu | Florian Valentin Wunderlich | Hanif Muhammad Zhafran | Tianhui Zhang | Yi Zhou | Saif M. Mohammad
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
People worldwide use language in subtle and complex ways to express emotions. Although emotion recognition–an umbrella term for several NLP tasks–impacts various applications within NLP and beyond, most work in this area has focused on high-resource languages. This has led to significant disparities in research efforts and proposed solutions, particularly for under-resourced languages, which often lack high-quality annotated datasets.In this paper, we present BRIGHTER–a collection of multi-labeled, emotion-annotated datasets in 28 different languages and across several domains. BRIGHTER primarily covers low-resource languages from Africa, Asia, Eastern Europe, and Latin America, with instances labeled by fluent speakers. We highlight the challenges related to the data collection and annotation processes, and then report experimental results for monolingual and crosslingual multi-label emotion identification, as well as emotion intensity recognition. We analyse the variability in performance across languages and text domains, both with and without the use of LLMs, and show that the BRIGHTER datasets represent a meaningful step towards addressing the gap in text-based emotion recognition.
2019
Detecting Everyday Scenarios in Narrative Texts
Lilian Diana Awuor Wanzare | Michael Roth | Manfred Pinkal
Proceedings of the Second Workshop on Storytelling
Lilian Diana Awuor Wanzare | Michael Roth | Manfred Pinkal
Proceedings of the Second Workshop on Storytelling
Script knowledge consists of detailed information on everyday activities. Such information is often taken for granted in text and needs to be inferred by readers. Therefore, script knowledge is a central component to language comprehension. Previous work on representing scripts is mostly based on extensive manual work or limited to scenarios that can be found with sufficient redundancy in large corpora. We introduce the task of scenario detection, in which we identify references to scripts. In this task, we address a wide range of different scripts (200 scenarios) and we attempt to identify all references to them in a collection of narrative texts. We present a first benchmark data set and a baseline model that tackles scenario detection using techniques from topic segmentation and text classification.
2017
Inducing Script Structure from Crowdsourced Event Descriptions via Semi-Supervised Clustering
Lilian Diana Awuor Wanzare | Alessandra Zarcone | Stefan Thater | Manfred Pinkal
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics
Lilian Diana Awuor Wanzare | Alessandra Zarcone | Stefan Thater | Manfred Pinkal
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics
We present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets (representing event types) and inducing their temporal order. Our approach exploits semantic and positional similarity and allows for flexible event order, thus overcoming the rigidity of previous approaches. We incorporate crowdsourced alignments as prior knowledge and show that exploiting a small number of alignments results in a substantial improvement in cluster quality over state-of-the-art models and provides an appropriate basis for the induction of temporal order. We also show a coverage study to demonstrate the scalability of our approach.
Search
Fix author
Co-authors
- Idris Abdulmumin 2
- David Ifeoluwa Adelani 2
- Ibrahim Said Ahmad 2
- Meriem Beloucif 2
- Andiswa Bukula 2
- Chiamaka Ijeoma Chukwuneke 2
- Oumaima Hourrane 2
- Rooweither Mabuya 2
- Shamsuddeen Hassan Muhammad 2
- Nedjma Ousidhoum 2
- Manfred Pinkal 2
- Samuel Rutunda 2
- Alham Fikri Aji 1
- Esubalew Alemneh 1
- Felermino D. M. A. Ali 1
- Ilseyar Alimova 1
- Saminu Mohammad Aliyu 1
- Lukman Jibril Aliyu 1
- Vladimir Araujo 1
- Tesfa Tegegne Asfaw 1
- Abinew Ali Ayele 1
- Nikolay Babakov 1
- Naomi Baes 1
- Tadesse Destaw Belay 1
- Ana-Maria Bucur 1
- Guanqun Cao 1
- Rendi Chevi 1
- Alexandra Ciobotaru 1
- Daryna Dementieva 1
- Charles Henrique Porto Ferreira 1
- Murja Sani Gadanya 1
- Hagos Tesfahun Gebremichael 1
- Robert Geislinger 1
- Bela Gipp 1
- Tadesse Kebede Guge 1
- Oana Ignat 1
- Elyas Abdi Ismail 1
- Ebrahim Chekol Jibril 1
- Falalu Ibrahim Lawan 1
- Rahmad Mahendra 1
- Vukosi Marivate 1
- Saif Mohammad 1
- Nelson Odhiambo Onyango 1
- Abigail Oppong 1
- Salomey Osei 1
- Alexander Panchenko 1
- Andrew Piper 1
- Vitaly Protasov 1
- Michael Roth 1
- Terry Ruas 1
- Paul Röttger 1
- Manish Shrivastava 1
- Nirmal Surange 1
- Daniela Teodorescu 1
- Stefan Thater 1
- Rodrigo Tufiño 1
- Aura Cristina Udrea 1
- Jan Philip Wahle 1
- Sophie Wu 1
- Florian Valentin Wunderlich 1
- Seid Muhie Yimam 1
- Alessandra Zarcone 1
- Hanif Muhammad Zhafran 1
- Tianhui Zhang 1
- Yi Zhou 1
- Christine de Kock 1