Haitham Seelawi


2022

pdf bib
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Paul Röttger | Haitham Seelawi | Debora Nozza | Zeerak Talat | Bertie Vidgen
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)

Hate speech detection models are typically evaluated on held-out test sets. However, this risks painting an incomplete and potentially misleading picture of model performance because of increasingly well-documented systematic gaps and biases in hate speech datasets. To enable more targeted diagnostic insights, recent research has thus introduced functional tests for hate speech detection models. However, these tests currently only exist for English-language content, which means that they cannot support the development of more effective models in other languages spoken by billions across the world. To help address this issue, we introduce Multilingual HateCheck (MHC), a suite of functional tests for multilingual hate speech detection models. MHC covers 34 functionalities across ten languages, which is more languages than any other hate speech dataset. To illustrate MHC’s utility, we train and test a high-performing multilingual hate speech detection model, and reveal critical model weaknesses for monolingual and cross-lingual applications.

2021

pdf bib
ALUE: Arabic Language Understanding Evaluation
Haitham Seelawi | Ibraheem Tuffaha | Mahmoud Gzawi | Wael Farhan | Bashar Talafha | Riham Badawi | Zyad Sober | Oday Al-Dweik | Abed Alhakim Freihat | Hussein Al-Natsheh
Proceedings of the Sixth Arabic Natural Language Processing Workshop

The emergence of Multi-task learning (MTL)models in recent years has helped push thestate of the art in Natural Language Un-derstanding (NLU). We strongly believe thatmany NLU problems in Arabic are especiallypoised to reap the benefits of such models. Tothis end we propose the Arabic Language Un-derstanding Evaluation Benchmark (ALUE),based on 8 carefully selected and previouslypublished tasks. For five of these, we providenew privately held evaluation datasets to en-sure the fairness and validity of our benchmark. We also provide a diagnostic dataset to helpresearchers probe the inner workings of theirmodels.Our initial experiments show thatMTL models outperform their singly trainedcounterparts on most tasks. But in order to en-tice participation from the wider community,we stick to publishing singly trained baselinesonly. Nonetheless, our analysis reveals thatthere is plenty of room for improvement inArabic NLU. We hope that ALUE will playa part in helping our community realize someof these improvements. Interested researchersare invited to submit their results to our online,and publicly accessible leaderboard.

2020

pdf bib
Multi-dialect Arabic BERT for Country-level Dialect Identification
Bashar Talafha | Mohammad Ali | Muhy Eddin Za’ter | Haitham Seelawi | Ibraheem Tuffaha | Mostafa Samir | Wael Farhan | Hussein Al-Natsheh
Proceedings of the Fifth Arabic Natural Language Processing Workshop

Arabic dialect identification is a complex problem for a number of inherent properties of the language itself. In this paper, we present the experiments conducted, and the models developed by our competing team, Mawdoo3 AI, along the way to achieving our winning solution to subtask 1 of the Nuanced Arabic Dialect Identification (NADI) shared task. The dialect identification subtask provides 21,000 country-level labeled tweets covering all 21 Arab countries. An unlabeled corpus of 10M tweets from the same domain is also presented by the competition organizers for optional use. Our winning solution itself came in the form of an ensemble of different training iterations of our pre-trained BERT model, which achieved a micro-averaged F1-score of 26.78% on the subtask at hand. We publicly release the pre-trained language model component of our winning solution under the name of Multi-dialect-Arabic-BERT model, for any interested researcher out there.

2019

pdf bib
Mawdoo3 AI at MADAR Shared Task: Arabic Fine-Grained Dialect Identification with Ensemble Learning
Ahmad Ragab | Haitham Seelawi | Mostafa Samir | Abdelrahman Mattar | Hesham Al-Bataineh | Mohammad Zaghloul | Ahmad Mustafa | Bashar Talafha | Abed Alhakim Freihat | Hussein Al-Natsheh
Proceedings of the Fourth Arabic Natural Language Processing Workshop

In this paper we discuss several models we used to classify 25 city-level Arabic dialects in addition to Modern Standard Arabic (MSA) as part of MADAR shared task (sub-task 1). We propose an ensemble model of a group of experimentally designed best performing classifiers on a various set of features. Our system achieves an accuracy of 69.3% macro F1-score with an improvement of 1.4% accuracy from the baseline model on the DEV dataset. Our best run submitted model ranked as third out of 19 participating teams on the TEST dataset with only 0.12% macro F1-score behind the top ranked system.

pdf bib
NSURL-2019 Task 8: Semantic Question Similarity in Arabic
Haitham Seelawi | Ahmad Mustafa | Hesham Al-Bataineh | Wael Farhan | Hussein T. Al-Natsheh
Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019 - Short Papers