Ashraf Elnagar


2025

pdf bib
Multilingual and Explainable Text Detoxification with Parallel Corpora
Daryna Dementieva | Nikolay Babakov | Amit Ronen | Abinew Ali Ayele | Naquee Rizwan | Florian Schneider | Xintong Wang | Seid Muhie Yimam | Daniil Alekhseevich Moskovskiy | Elisei Stakovskii | Eran Kaufman | Ashraf Elnagar | Animesh Mukherjee | Alexander Panchenko
Proceedings of the 31st International Conference on Computational Linguistics

Even with various regulations in place across countries and social media platforms (Government of India, 2021; European Parliament and Council of the European Union, 2022), digital abusive speech remains a significant issue. One potential approach to address this challenge is automatic text detoxification, a text style transfer (TST) approach that transforms toxic language into a more neutral or non-toxic form. To date, the availability of parallel corpora for the text detoxification task (Logacheva et al., 2022; Atwell et al., 2022; Dementieva et al., 2024a) has proven to be crucial for state-of-the-art approaches. With this work, we extend parallel text detoxification corpus to new languages—German, Chinese, Arabic, Hindi, and Amharic—testing in the extensive multilingual setup TST baselines. Next, we conduct the first of its kind an automated, explainable analysis of the descriptive features of both toxic and non-toxic sentences, diving deeply into the nuances, similarities, and differences of toxicity and detoxification across 9 languages. Finally, based on the obtained insights, we experiment with a novel text detoxification method inspired by the Chain-of-Thoughts reasoning approach, enhancing the prompting process through clustering on relevant descriptive attributes.

2024

pdf bib
AraCLIP: Cross-Lingual Learning for Effective Arabic Image Retrieval
Muhammad Al-Barham | Imad Afyouni | Khalid Almubarak | Ashraf Elnagar | Ayad Turky | Ibrahim Hashem
Proceedings of The Second Arabic Natural Language Processing Conference

This paper introduces Arabic Contrastive Language-Image Pre-training (AraCLIP), a model designed for Arabic image retrieval tasks, building upon the Contrastive Language-Image Pre-training (CLIP) architecture. AraCLIP leverages Knowledge Distillation to transfer cross-modal knowledge from English to Arabic, enhancing its ability to understand Arabic text and retrieve relevant images. Unlike existing multilingual models, AraCLIP is uniquely positioned to understand the intricacies of the Arabic language, including specific terms, cultural nuances, and contextual constructs. By leveraging the CLIP architecture as our foundation, we introduce a novel approach that seamlessly integrates textual and visual modalities, enabling AraCLIP to effectively retrieve images based on Arabic textual queries. We offer an online demonstration allowing users to input Arabic prompts and compare AraCLIP’s performance with state-of-the-art multilingual models. We conduct comprehensive experiments to evaluate AraCLIP’s performance across diverse datasets, including Arabic XTD-11, and Arabic Flicker 8k. Our results showcase AraCLIP’s superiority in image retrieval accuracy, demonstrating its effectiveness in handling Arabic queries. AraCLIP represents a significant advancement in cross-lingual image retrieval, offering promising applications in Arabic language processing and beyond.

2022

pdf bib
Arabic Image Captioning using Pre-training of Deep Bidirectional Transformers
Jonathan Emami | Pierre Nugues | Ashraf Elnagar | Imad Afyouni
Proceedings of the 15th International Conference on Natural Language Generation

2019

pdf bib
Automatic Text Tagging of Arabic News Articles Using Ensemble Deep Learning Models
Ashraf Elnagar | Omar Einea | Ridhwan Al-Debsi
Proceedings of the 3rd International Conference on Natural Language and Speech Processing