Sana Al-Azzawi
Also published as: Sana Al-azzawi
2024
AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages
Jiayi Wang | David Ifeoluwa Adelani | Sweta Agrawal | Marek Masiak | Ricardo Rei | Eleftheria Briakou | Marine Carpuat | Xuanli He | Sofia Bourhim | Andiswa Bukula | Muhidin Mohamed | Temitayo Olatoye | Tosin Adewumi | Hamam Mokayed | Christine Mwase | Wangui Kimotho | Foutse Yuehgoh | Anuoluwapo Aremu | Jessica Ojo | Shamsuddeen Hassan Muhammad | Salomey Osei | Abdul-Hakeem Omotayo | Chiamaka Chukwuneke | Perez Ogayo | Oumaima Hourrane | Salma El Anigri | Lolwethu Ndolela | Thabiso Mangwana | Shafie Abdi Mohamed | Hassan Ayinde | Oluwabusayo Olufunke Awoyomi | Lama Alkhaled | Sana Al-azzawi | Naome A. Etori | Millicent Ochieng | Clemencia Siro | Njoroge Kiragu | Eric Muchiri | Wangari Kimotho | Lyse Naomi Wamba Momo | Daud Abolade | Simbiat Ajao | Iyanuoluwa Shode | Ricky Macharm | Ruqayya Nasir Iro | Saheed S. Abdullahi | Stephen E. Moore | Bernard Opoku | Zainab Akinjobi | Abeeb Afolabi | Nnaemeka Obiefuna | Onyekachi Raphael Ogbu | Sam Ochieng’ | Verrah Akinyi Otiende | Chinedu Emmanuel Mbonu | Sakayo Toadoum Sari | Yao Lu | Pontus Stenetorp
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Jiayi Wang | David Ifeoluwa Adelani | Sweta Agrawal | Marek Masiak | Ricardo Rei | Eleftheria Briakou | Marine Carpuat | Xuanli He | Sofia Bourhim | Andiswa Bukula | Muhidin Mohamed | Temitayo Olatoye | Tosin Adewumi | Hamam Mokayed | Christine Mwase | Wangui Kimotho | Foutse Yuehgoh | Anuoluwapo Aremu | Jessica Ojo | Shamsuddeen Hassan Muhammad | Salomey Osei | Abdul-Hakeem Omotayo | Chiamaka Chukwuneke | Perez Ogayo | Oumaima Hourrane | Salma El Anigri | Lolwethu Ndolela | Thabiso Mangwana | Shafie Abdi Mohamed | Hassan Ayinde | Oluwabusayo Olufunke Awoyomi | Lama Alkhaled | Sana Al-azzawi | Naome A. Etori | Millicent Ochieng | Clemencia Siro | Njoroge Kiragu | Eric Muchiri | Wangari Kimotho | Lyse Naomi Wamba Momo | Daud Abolade | Simbiat Ajao | Iyanuoluwa Shode | Ricky Macharm | Ruqayya Nasir Iro | Saheed S. Abdullahi | Stephen E. Moore | Bernard Opoku | Zainab Akinjobi | Abeeb Afolabi | Nnaemeka Obiefuna | Onyekachi Raphael Ogbu | Sam Ochieng’ | Verrah Akinyi Otiende | Chinedu Emmanuel Mbonu | Sakayo Toadoum Sari | Yao Lu | Pontus Stenetorp
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measuring this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation with human judgments. Learned metrics such as COMET have higher correlation; however, the lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET: COMET evaluation metrics for African languages by leveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-the-art MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441).
Proceedings of the Sixth Workshop on Teaching NLP
Sana Al-azzawi | Laura Biester | György Kovács | Ana Marasović | Leena Mathur | Margot Mieskes | Leonie Weissweiler
Proceedings of the Sixth Workshop on Teaching NLP
Sana Al-azzawi | Laura Biester | György Kovács | Ana Marasović | Leena Mathur | Margot Mieskes | Leonie Weissweiler
Proceedings of the Sixth Workshop on Teaching NLP
2023
MasakhaNEWS: News Topic Classification for African languages
David Ifeoluwa Adelani | Marek Masiak | Israel Abebe Azime | Jesujoba Alabi | Atnafu Lambebo Tonja | Christine Mwase | Odunayo Ogundepo | Bonaventure F. P. Dossou | Akintunde Oladipo | Doreen Nixdorf | Chris Chinenye Emezue | Sana Al-azzawi | Blessing Sibanda | Davis David | Lolwethu Ndolela | Jonathan Mukiibi | Tunde Ajayi | Tatiana Moteu | Brian Odhiambo | Abraham Owodunni | Nnaemeka Obiefuna | Muhidin Mohamed | Shamsuddeen Hassan Muhammad | Teshome Mulugeta Ababu | Saheed Abdullahi Salahudeen | Mesay Gemeda Yigezu | Tajuddeen Gwadabe | Idris Abdulmumin | Mahlet Taye | Oluwabusayo Awoyomi | Iyanuoluwa Shode | Tolulope Adelani | Habiba Abdulganiyu | Abdul-Hakeem Omotayo | Adetola Adeeko | Abeeb Afolabi | Anuoluwapo Aremu | Olanrewaju Samuel | Clemencia Siro | Wangari Kimotho | Onyekachi Ogbu | Chinedu Mbonu | Chiamaka Chukwuneke | Samuel Fanijo | Jessica Ojo | Oyinkansola Awosan | Tadesse Kebede | Toadoum Sari Sakayo | Pamela Nyatsine | Freedmore Sidume | Oreen Yousuf | Mardiyyah Oduwole | Kanda Tshinu | Ussen Kimanuka | Thina Diko | Siyanda Nxakama | Sinodos Nigusse | Abdulmejid Johar | Shafie Mohamed | Fuad Mire Hassan | Moges Ahmed Mehamed | Evrard Ngabire | Jules Jules | Ivan Ssenkungu | Pontus Stenetorp
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
David Ifeoluwa Adelani | Marek Masiak | Israel Abebe Azime | Jesujoba Alabi | Atnafu Lambebo Tonja | Christine Mwase | Odunayo Ogundepo | Bonaventure F. P. Dossou | Akintunde Oladipo | Doreen Nixdorf | Chris Chinenye Emezue | Sana Al-azzawi | Blessing Sibanda | Davis David | Lolwethu Ndolela | Jonathan Mukiibi | Tunde Ajayi | Tatiana Moteu | Brian Odhiambo | Abraham Owodunni | Nnaemeka Obiefuna | Muhidin Mohamed | Shamsuddeen Hassan Muhammad | Teshome Mulugeta Ababu | Saheed Abdullahi Salahudeen | Mesay Gemeda Yigezu | Tajuddeen Gwadabe | Idris Abdulmumin | Mahlet Taye | Oluwabusayo Awoyomi | Iyanuoluwa Shode | Tolulope Adelani | Habiba Abdulganiyu | Abdul-Hakeem Omotayo | Adetola Adeeko | Abeeb Afolabi | Anuoluwapo Aremu | Olanrewaju Samuel | Clemencia Siro | Wangari Kimotho | Onyekachi Ogbu | Chinedu Mbonu | Chiamaka Chukwuneke | Samuel Fanijo | Jessica Ojo | Oyinkansola Awosan | Tadesse Kebede | Toadoum Sari Sakayo | Pamela Nyatsine | Freedmore Sidume | Oreen Yousuf | Mardiyyah Oduwole | Kanda Tshinu | Ussen Kimanuka | Thina Diko | Siyanda Nxakama | Sinodos Nigusse | Abdulmejid Johar | Shafie Mohamed | Fuad Mire Hassan | Moges Ahmed Mehamed | Evrard Ngabire | Jules Jules | Ivan Ssenkungu | Pontus Stenetorp
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Bipol: Multi-Axes Evaluation of Bias with Explainability in Benchmark Datasets
Tosin Adewumi | Isabella Södergren | Lama Alkhaled | Sana Al-azzawi | Foteini Simistira Liwicki | Marcus Liwicki
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Tosin Adewumi | Isabella Södergren | Lama Alkhaled | Sana Al-azzawi | Foteini Simistira Liwicki | Marcus Liwicki
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
We investigate five English NLP benchmark datasets (on the superGLUE leaderboard) and two Swedish datasets for bias, along multiple axes. The datasets are the following: Boolean Question (Boolq), CommitmentBank (CB), Winograd Schema Challenge (WSC), Winogender diagnostic (AXg), Recognising Textual Entailment (RTE), Swedish CB, and SWEDN. Bias can be harmful and it is known to be common in data, which ML models learn from. In order to mitigate bias in data, it is crucial to be able to estimate it objectively. We use bipol, a novel multi-axes bias metric with explainability, to estimate and explain how much bias exists in these datasets. Multilingual, multi-axes bias evaluation is not very common. Hence, we also contribute a new, large Swedish bias-labelled dataset (of 2 million samples), translated from the English version and train the SotA mT5 model on it. In addition, we contribute new multi-axes lexica for bias detection in Swedish. We make the codes, model, and new dataset publicly available.
Lon-eå at SemEval-2023 Task 11: A Comparison of Activation Functions for Soft and Hard Label Prediction
Peyman Hosseini | Mehran Hosseini | Sana Al-azzawi | Marcus Liwicki | Ignacio Castro | Matthew Purver
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Peyman Hosseini | Mehran Hosseini | Sana Al-azzawi | Marcus Liwicki | Ignacio Castro | Matthew Purver
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
We study the influence of different activation functions in the output layer of pre-trained transformer models for soft and hard label prediction in the learning with disagreement task. In this task, the goal is to quantify the amount of disagreement via predicting soft labels. To predict the soft labels, we use BERT-based preprocessors and encoders and vary the activation function used in the output layer, while keeping other parameters constant. The soft labels are then used for the hard label prediction. The activation functions considered are sigmoid as well as a step-function that is added to the model post-training and a sinusoidal activation function, which is introduced for the first time in this paper.
NLP-LTU at SemEval-2023 Task 10: The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset
Sana Al-Azzawi | György Kovács | Filip Nilsson | Tosin Adewumi | Marcus Liwicki
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Sana Al-Azzawi | György Kovács | Filip Nilsson | Tosin Adewumi | Marcus Liwicki
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
In this paper, we propose a methodology fortask 10 of SemEval23, focusing on detectingand classifying online sexism in social me-dia posts. The task is tackling a serious is-sue, as detecting harmful content on socialmedia platforms is crucial for mitigating theharm of these posts on users. Our solutionfor this task is based on an ensemble of fine-tuned transformer-based models (BERTweet,RoBERTa, and DeBERTa). To alleviate prob-lems related to class imbalance, and to improvethe generalization capability of our model, wealso experiment with data augmentation andsemi-supervised learning. In particular, fordata augmentation, we use back-translation, ei-ther on all classes, or on the underrepresentedclasses only. We analyze the impact of thesestrategies on the overall performance of thepipeline through extensive experiments. whilefor semi-supervised learning, we found thatwith a substantial amount of unlabelled, in-domain data available, semi-supervised learn-ing can enhance the performance of certainmodels. Our proposed method (for which thesource code is available on Github12) attainsan F 1-score of 0.8613 for sub-taskA, whichranked us 10th in the competition.
Search
Fix author
Co-authors
- Tosin Adewumi 3
- Marcus Liwicki 3
- David Ifeoluwa Adelani 2
- Abeeb Afolabi 2
- Lama Alkhaled 2
- Anuoluwapo Aremu 2
- Chiamaka Chukwuneke 2
- Wangari Kimotho 2
- György Kovács 2
- Marek Masiak 2
- Muhidin Mohamed 2
- Shamsuddeen Hassan Muhammad 2
- Christine Mwase 2
- Lolwethu Ndolela 2
- Nnaemeka Obiefuna 2
- Jessica Ojo 2
- Abdul-Hakeem Omotayo 2
- Iyanuoluwa Shode 2
- Clemencia Siro 2
- Pontus Stenetorp 2
- Teshome Mulugeta Ababu 1
- Habiba Abdulganiyu 1
- Saheed S. Abdullahi 1
- Idris Abdulmumin 1
- Daud Abolade 1
- Adetola Adeeko 1
- Tolulope Adelani 1
- Sweta Agrawal 1
- Simbiat Ajao 1
- Tunde Ajayi 1
- Zainab Akinjobi 1
- Jesujoba Alabi 1
- Oyinkansola Awosan 1
- Oluwabusayo Awoyomi 1
- Oluwabusayo Olufunke Awoyomi 1
- Hassan Ayinde 1
- Israel Abebe Azime 1
- Laura Biester 1
- Sofia Bourhim 1
- Eleftheria Briakou 1
- Andiswa Bukula 1
- Marine Carpuat 1
- Ignacio Castro 1
- Davis David 1
- Thina Diko 1
- Bonaventure F. P. Dossou 1
- Salma El Anigri 1
- Chris Chinenye Emezue 1
- Naome A. Etori 1
- Samuel Fanijo 1
- Tajuddeen Gwadabe 1
- Fuad Mire Hassan 1
- Xuanli He 1
- Peyman Hosseini 1
- Mehran Hosseini 1
- Oumaima Hourrane 1
- Ruqayya Nasir Iro 1
- Abdulmejid Johar 1
- Jules Jules 1
- Tadesse Kebede 1
- Ussen Kimanuka 1
- Wangui Kimotho 1
- Njoroge Kiragu 1
- Yao Lu 1
- Ricky Macharm 1
- Thabiso Mangwana 1
- Ana Marasović 1
- Leena Mathur 1
- Chinedu Mbonu 1
- Chinedu Emmanuel Mbonu 1
- Moges Ahmed Mehamed 1
- Margot Mieskes 1
- Shafie Mohamed 1
- Shafie Abdi Mohamed 1
- Hamam Mokayed 1
- Stephen E. Moore 1
- Tatiana Moteu 1
- Eric Muchiri 1
- Jonathan Mukiibi 1
- Evrard Ngabire 1
- Sinodos Nigusse 1
- Filip Nilsson 1
- Doreen Nixdorf 1
- Siyanda Nxakama 1
- Pamela Nyatsine 1
- Millicent Ochieng 1
- Sam Ochieng’ 1
- Brian Odhiambo 1
- Mardiyyah Oduwole 1
- Perez Ogayo 1
- Onyekachi Ogbu 1
- Onyekachi Raphael Ogbu 1
- Odunayo Ogundepo 1
- Akintunde Oladipo 1
- Temitayo Olatoye 1
- Bernard Opoku 1
- Salomey Osei 1
- Verrah Akinyi Otiende 1
- Abraham Toluwase Owodunni 1
- Matthew Purver 1
- Ricardo Rei 1
- Toadoum Sari Sakayo 1
- Saheed Abdullahi Salahudeen 1
- Olanrewaju Samuel 1
- Blessing Kudzaishe Sibanda 1
- Freedmore Sidume 1
- Foteini Simistira Liwicki 1
- Ivan Ssenkungu 1
- Isabella Södergren 1
- Mahlet Taye 1
- Sakayo Toadoum Sari 1
- Atnafu Lambebo Tonja 1
- Kanda Tshinu 1
- Lyse Naomi Wamba Momo 1
- Jiayi Wang 1
- Leonie Weissweiler 1
- Mesay Gemeda Yigezu 1
- Oreen Yousuf 1
- Foutse Yuehgoh 1