Md Tawkat Islam Khondaker

Also published as: Md Tawkat Islam Khondaker


2023

pdf bib
TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties
Karima Kadaoui | Samar Magdy | Abdul Waheed | Md Tawkat Islam Khondaker | Ahmed El-Shangiti | El Moatez Billah Nagoudi | Muhammad Abdul-Mageed
Proceedings of ArabicNLP 2023

Despite the purported multilingual proficiency of instruction-finetuned large language models (LLMs) such as ChatGPT and Bard, the linguistic inclusivity of these models remains insufficiently explored. Considering this constraint, we present a thorough assessment of Bard and ChatGPT (encompassing both GPT-3.5 and GPT-4) regarding their machine translation proficiencies across ten varieties of Arabic. Our evaluation covers diverse Arabic varieties such as Classical Arabic (CA), Modern Standard Arabic (MSA), and several country-level dialectal variants. Our analysis indicates that LLMs may encounter challenges with dialects for which minimal public datasets exist, but on average are better translators of dialects than existing commercial systems. On CA and MSA, instruction-tuned LLMs, however, trail behind commercial systems such as Google Translate. Finally, we undertake a human-centric study to scrutinize the efficacy of the relatively recent model, Bard, in following human instructions during translation tasks. Our analysis reveals a circumscribed capability of Bard in aligning with human instructions in translation contexts. Collectively, our findings underscore that prevailing LLMs remain far from inclusive, with only limited ability to cater for the linguistic and cultural intricacies of diverse communities.

pdf bib
GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP
Md Tawkat Islam Khondaker | Abdul Waheed | El Moatez Billah Nagoudi | Muhammad Abdul-Mageed
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

ChatGPT’s emergence heralds a transformative phase in NLP, particularly demonstrated through its excellent performance on many English benchmarks. However, the model’s efficacy across diverse linguistic contexts remains largely uncharted territory. This work aims to bridge this knowledge gap, with a primary focus on assessing ChatGPT’s capabilities on Arabic languages and dialectal varieties. Our comprehensive study conducts a large-scale automated and human evaluation of ChatGPT, encompassing 44 distinct language understanding and generation tasks on over 60 different datasets. To our knowledge, this marks the first extensive performance analysis of ChatGPT’s deployment in Arabic NLP. Our findings indicate that, despite its remarkable performance in English, ChatGPT is consistently surpassed by smaller models that have undergone finetuning on Arabic. We further undertake a meticulous comparison of ChatGPT and GPT-4’s Modern Standard Arabic (MSA) and Dialectal Arabic (DA), unveiling the relative shortcomings of both models in handling Arabic dialects compared to MSA. Although we further explore and confirm the utility of employing GPT-4 as a potential alternative for human evaluation, our work adds to a growing body of research underscoring the limitations of ChatGPT.

pdf bib
JASMINE: Arabic GPT Models for Few-Shot Learning
El Moatez Billah Nagoudi | Muhammad Abdul-Mageed | AbdelRahim Elmadany | Alcides Inciarte | Md Tawkat Islam Khondaker
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Scholarship on generative pretraining (GPT) remains acutely Anglocentric, leaving serious gaps in our understanding of the whole class of autoregressive models. For example, we have little knowledge about the potential of these models and their societal impacts in diverse linguistic and cultural settings. We alleviate this issue for Arabic, a wide collection of languages and dialectal varieties with more than 400 million population, by introducing JASMINE. JASMINE is a suite of powerful Arabic autoregressive Transformer language models ranging in size between 300 million-6.7 billion parameters pretrained on a large and diverse dataset ( 235 GB of text). We also carefully design and release a comprehensive benchmark for both automated and human evaluation of Arabic autoregressive models, with coverage of potential social biases, harms, and toxicity. Using our novel benchmark, we evaluate JASMINE extensively showing powerful performance intrinsically as well as in few-shot learning on a wide range of NLP tasks. We aim to responsibly release our models and evaluation benchmark with interested researchers, along with code for experimenting with them.

pdf bib
PACT: Pretraining with Adversarial Contrastive Learning for Text Classification
Md Tawkat Islam Khondaker | Muhammad Abdul-Mageed | Laks Lakshmanan, V.S.
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Cross-Platform and Cross-Domain Abusive Language Detection with Supervised Contrastive Learning
Md Tawkat Islam Khondaker | Muhammad Abdul-mageed | Laks Lakshmanan, V.s.
The 7th Workshop on Online Abuse and Harms (WOAH)

The prevalence of abusive language on different online platforms has been a major concern that raises the need for automated cross-platform abusive language detection. However, prior works focus on concatenating data from multiple platforms, inherently adopting Empirical Risk Minimization (ERM) method. In this work, we address this challenge from the perspective of domain generalization objective. We design SCL-Fish, a supervised contrastive learning integrated meta-learning algorithm to detect abusive language on unseen platforms. Our experimental analysis shows that SCL-Fish achieves better performance over ERM and the existing state-of-the-art models. We also show that SCL-Fish is data-efficient and achieves comparable performance with the large-scale pre-trained models upon finetuning for the abusive language detection task.

2022

pdf bib
A Benchmark Study of Contrastive Learning for Arabic Social Meaning
Md Tawkat Islam Khondaker | El Moatez Billah Nagoudi | AbdelRahim Elmadany | Muhammad Abdul-Mageed | Laks Lakshmanan, V.S.
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

Contrastive learning (CL) has brought significant progress to various NLP tasks. Despite such a progress, CL has not been applied to Arabic NLP. Nor is it clear how much benefits it could bring to particular classes of tasks such as social meaning (e.g., sentiment analysis, dialect identification, hate speech detection). In this work, we present a comprehensive benchmark study of state-of-the-art supervised CL methods on a wide array of Arabic social meaning tasks. Through an extensive empirical analysis, we show that CL methods outperform vanilla finetuning on most of the tasks. We also show that CL can be data efficient and quantify this efficiency, demonstrating the promise of these methods in low-resource settings vis-a-vis the particular downstream tasks (especially label granularity).