Fuad Mire Hassan
2026
Morphologically-informed Somali Lemmatization Corpus built with a Web-based Crowdsourcing Platform
Abdifatah Ahmed Gedi | Shafie Abdi Mohamed | Yusuf A. Yusuf | Muhidin A. Mohamed | Fuad Mire Hassan | Houssein A Assowe
Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)
Abdifatah Ahmed Gedi | Shafie Abdi Mohamed | Yusuf A. Yusuf | Muhidin A. Mohamed | Fuad Mire Hassan | Houssein A Assowe
Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)
Lemmatization, which reduces words to their root forms, plays a key role in tasks such as information retrieval, text indexing, and machinelearning-based language models. However, a key research challenge for low-resourced languages such as the Somali is the lack of human-annotated lemmatization datasets and reliable ground truth to underpin accurate morphological analysis and training relevant NLP models. To address this problem, we developed the first large-scale, purpose-built Somali lemmatization lexicon, coupled with a crowdsourcing platform for ongoing expansion. The system leverages Somali’s agglutinative and derivational morphology, encompassing over5,584 root words and 78,629 derivative forms, each annotated with part-of-speech tags. For data validation purpose, we have devised a pilot lexicon-based lemmatizer integrated with rule-based logic to handle out-of-vocabulary terms. Evaluation on a 294-document corpuscovering news articles, social media posts, and short messages shows lemmatization accuracies of 51.27% for full articles, 44.14% forexcerpts, and 59.51% for short texts such as tweets. These results demonstrate that combining lexical resources, POS tagging, and rulebased strategies provides a robust and scalable framework for addressing morphological complexity in Somali and other low-resource languages
2023
MasakhaNEWS: News Topic Classification for African languages
David Ifeoluwa Adelani | Marek Masiak | Israel Abebe Azime | Jesujoba Alabi | Atnafu Lambebo Tonja | Christine Mwase | Odunayo Ogundepo | Bonaventure F. P. Dossou | Akintunde Oladipo | Doreen Nixdorf | Chris Chinenye Emezue | Sana Al-azzawi | Blessing Sibanda | Davis David | Lolwethu Ndolela | Jonathan Mukiibi | Tunde Ajayi | Tatiana Moteu | Brian Odhiambo | Abraham Owodunni | Nnaemeka Obiefuna | Muhidin Mohamed | Shamsuddeen Hassan Muhammad | Teshome Mulugeta Ababu | Saheed Abdullahi Salahudeen | Mesay Gemeda Yigezu | Tajuddeen Gwadabe | Idris Abdulmumin | Mahlet Taye | Oluwabusayo Awoyomi | Iyanuoluwa Shode | Tolulope Adelani | Habiba Abdulganiyu | Abdul-Hakeem Omotayo | Adetola Adeeko | Abeeb Afolabi | Anuoluwapo Aremu | Olanrewaju Samuel | Clemencia Siro | Wangari Kimotho | Onyekachi Ogbu | Chinedu Mbonu | Chiamaka Chukwuneke | Samuel Fanijo | Jessica Ojo | Oyinkansola Awosan | Tadesse Kebede | Toadoum Sari Sakayo | Pamela Nyatsine | Freedmore Sidume | Oreen Yousuf | Mardiyyah Oduwole | Kanda Tshinu | Ussen Kimanuka | Thina Diko | Siyanda Nxakama | Sinodos Nigusse | Abdulmejid Johar | Shafie Mohamed | Fuad Mire Hassan | Moges Ahmed Mehamed | Evrard Ngabire | Jules Jules | Ivan Ssenkungu | Pontus Stenetorp
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
David Ifeoluwa Adelani | Marek Masiak | Israel Abebe Azime | Jesujoba Alabi | Atnafu Lambebo Tonja | Christine Mwase | Odunayo Ogundepo | Bonaventure F. P. Dossou | Akintunde Oladipo | Doreen Nixdorf | Chris Chinenye Emezue | Sana Al-azzawi | Blessing Sibanda | Davis David | Lolwethu Ndolela | Jonathan Mukiibi | Tunde Ajayi | Tatiana Moteu | Brian Odhiambo | Abraham Owodunni | Nnaemeka Obiefuna | Muhidin Mohamed | Shamsuddeen Hassan Muhammad | Teshome Mulugeta Ababu | Saheed Abdullahi Salahudeen | Mesay Gemeda Yigezu | Tajuddeen Gwadabe | Idris Abdulmumin | Mahlet Taye | Oluwabusayo Awoyomi | Iyanuoluwa Shode | Tolulope Adelani | Habiba Abdulganiyu | Abdul-Hakeem Omotayo | Adetola Adeeko | Abeeb Afolabi | Anuoluwapo Aremu | Olanrewaju Samuel | Clemencia Siro | Wangari Kimotho | Onyekachi Ogbu | Chinedu Mbonu | Chiamaka Chukwuneke | Samuel Fanijo | Jessica Ojo | Oyinkansola Awosan | Tadesse Kebede | Toadoum Sari Sakayo | Pamela Nyatsine | Freedmore Sidume | Oreen Yousuf | Mardiyyah Oduwole | Kanda Tshinu | Ussen Kimanuka | Thina Diko | Siyanda Nxakama | Sinodos Nigusse | Abdulmejid Johar | Shafie Mohamed | Fuad Mire Hassan | Moges Ahmed Mehamed | Evrard Ngabire | Jules Jules | Ivan Ssenkungu | Pontus Stenetorp
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Search
Fix author
Co-authors
- Teshome Mulugeta Ababu 1
- Habiba Abdulganiyu 1
- Idris Abdulmumin 1
- Adetola Adeeko 1
- David Ifeoluwa Adelani 1
- Tolulope Adelani 1
- Abeeb Afolabi 1
- Tunde Ajayi 1
- Sana Al-Azzawi 1
- Jesujoba Alabi 1
- Anuoluwapo Aremu 1
- Houssein A Assowe 1
- Oyinkansola Awosan 1
- Oluwabusayo Awoyomi 1
- Israel Abebe Azime 1
- Chiamaka Chukwuneke 1
- Davis David 1
- Thina Diko 1
- Bonaventure F. P. Dossou 1
- Chris Chinenye Emezue 1
- Samuel Fanijo 1
- Abdifatah Ahmed Gedi 1
- Tajuddeen Gwadabe 1
- Abdulmejid Johar 1
- Jules Jules 1
- Tadesse Kebede 1
- Ussen Kimanuka 1
- Wangari Kimotho 1
- Marek Masiak 1
- Chinedu Mbonu 1
- Moges Ahmed Mehamed 1
- Muhidin Mohamed 1
- Shafie Mohamed 1
- Shafie Abdi Mohamed 1
- Muhidin A. Mohamed 1
- Tatiana Moteu 1
- Shamsuddeen Hassan Muhammad 1
- Jonathan Mukiibi 1
- Christine Mwase 1
- Lolwethu Ndolela 1
- Evrard Ngabire 1
- Sinodos Nigusse 1
- Doreen Nixdorf 1
- Siyanda Nxakama 1
- Pamela Nyatsine 1
- Nnaemeka Obiefuna 1
- Brian Odhiambo 1
- Mardiyyah Oduwole 1
- Onyekachi Ogbu 1
- Odunayo Ogundepo 1
- Jessica Ojo 1
- Akintunde Oladipo 1
- Abdul-Hakeem Omotayo 1
- Abraham Toluwase Owodunni 1
- Toadoum Sari Sakayo 1
- Saheed Abdullahi Salahudeen 1
- Olanrewaju Samuel 1
- Iyanuoluwa Shode 1
- Blessing Kudzaishe Sibanda 1
- Freedmore Sidume 1
- Clemencia Siro 1
- Ivan Ssenkungu 1
- Pontus Stenetorp 1
- Mahlet Taye 1
- Atnafu Lambebo Tonja 1
- Kanda Tshinu 1
- Mesay Gemeda Yigezu 1
- Oreen Yousuf 1
- Yusuf A. Yusuf 1