Imed Zitouni


2024

pdf bib
Proceedings of The Second Arabic Natural Language Processing Conference
Nizar Habash | Houda Bouamor | Ramy Eskander | Nadi Tomeh | Ibrahim Abu Farha | Ahmed Abdelali | Samia Touileb | Injy Hamed | Yaser Onaizan | Bashar Alhafni | Wissam Antoun | Salam Khalifa | Hatem Haddad | Imed Zitouni | Badr AlKhamissi | Rawan Almatham | Khalil Mrini
Proceedings of The Second Arabic Natural Language Processing Conference

pdf bib
ArabicNLU 2024: The First Arabic Natural Language Understanding Shared Task
Mohammed Khalilia | Sanad Malaysha | Reem Suwaileh | Mustafa Jarrar | Alaa Aljabari | Tamer Elsayed | Imed Zitouni
Proceedings of The Second Arabic Natural Language Processing Conference

This paper presents an overview of the Arabic Natural Language Understanding (ArabicNLU 2024) shared task, focusing on two subtasks: Word Sense Disambiguation (WSD) and Location Mention Disambiguation (LMD). The task aimed to evaluate the ability of automated systems to resolve word ambiguity and identify locations mentioned in Arabic text. We provided participants with novel datasets, including a sense-annotated corpus for WSD, called SALMA with approximately 34k annotated tokens, and the dataset with 3,893 annotations and 763 unique location mentions. These are challenging tasks. Out of the 38 registered teams, only three teams participated in the final evaluation phase, with the highest accuracy being 77.8% for WSD and 95.0% for LMD. The shared task not only facilitated the evaluation and comparison of different techniques, but also provided valuable insights and resources for the continued advancement of Arabic NLU technologies.

pdf bib
The FIGNEWS Shared Task on News Media Narratives
Wajdi Zaghouani | Mustafa Jarrar | Nizar Habash | Houda Bouamor | Imed Zitouni | Mona Diab | Samhaa El-Beltagy | Muhammed AbuOdeh
Proceedings of The Second Arabic Natural Language Processing Conference

We present an overview of the FIGNEWSshared task, organized as part of the Arabic-NLP 2024 conference co-located with ACL2024. The shared task addresses bias and pro-paganda annotation in multilingual news posts.We focus on the early days of the Israel War onGaza as a case study. The task aims to fostercollaboration in developing annotation guide-lines for subjective tasks by creating frame-works for analyzing diverse narratives high-lighting potential bias and propaganda. In aspirit of fostering and encouraging diversity,we address the problem from a multilingualperspective, namely within five languages: En-glish, French, Arabic, Hebrew, and Hindi. Atotal of 17 teams participated in two annota-tion subtasks: bias (16 teams) and propaganda(6 teams). The teams competed in four evalua-tion tracks: guidelines development, annotationquality, annotation quantity, and consistency.Collectively, the teams produced 129,800 datapoints. Key findings and implications for thefield are discussed.

2023

pdf bib
SamToNe: Improving Contrastive Loss for Dual Encoder Retrieval Models with Same Tower Negatives
Fedor Moiseev | Gustavo Hernandez Abrego | Peter Dornbach | Imed Zitouni | Enrique Alfonseca | Zhe Dong
Findings of the Association for Computational Linguistics: ACL 2023

Dual encoders have been used for retrieval tasks and representation learning with good results. A standard way to train dual encoders is using a contrastive loss with in-batch negatives. In this work, we propose an improved contrastive learning objective by adding queries or documents from the same encoder towers to the negatives, for which we name it as “contrastive loss with SAMe TOwer NEgatives” (SamToNe). By evaluating on question answering retrieval benchmarks from MS MARCO and MultiReQA, and heterogenous zero-shot information retrieval benchmarks (BEIR), we demonstrate that SamToNe can effectively improve the retrieval quality for both symmetric and asymmetric dual encoders. By directly probing the embedding spaces of the two encoding towers via the t-SNE algorithm (van der Maaten and Hinton, 2008), we observe that SamToNe ensures the alignment between the embedding spaces from the two encoder towers. Based on the analysis of the embedding distance distributions of the top-1 retrieved results, we further explain the efficacy of the method from the perspective of regularisation.

pdf bib
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Mingxuan Wang | Imed Zitouni
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

pdf bib
Proceedings of ArabicNLP 2023
Hassan Sawaf | Samhaa El-Beltagy | Wajdi Zaghouani | Walid Magdy | Ahmed Abdelali | Nadi Tomeh | Ibrahim Abu Farha | Nizar Habash | Salam Khalifa | Amr Keleg | Hatem Haddad | Imed Zitouni | Khalil Mrini | Rawan Almatham
Proceedings of ArabicNLP 2023

2022

pdf bib
Exploring Dual Encoder Architectures for Question Answering
Zhe Dong | Jianmo Ni | Dan Bikel | Enrique Alfonseca | Yuan Wang | Chen Qu | Imed Zitouni
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Dual encoders have been used for question-answering (QA) and information retrieval (IR) tasks with good results. There are two major types of dual encoders, Siamese Dual Encoders (SDE), with parameters shared across two encoders, and Asymmetric Dual Encoder (ADE), with two distinctly parameterized encoders. In this work, we explore the dual encoder architectures for QA retrieval tasks. By evaluating on MS MARCO, open domain NQ, and the MultiReQA benchmarks, we show that SDE performs significantly better than ADE. We further propose three different improved versions of ADEs. Based on the evaluation of QA retrieval tasks and direct analysis of the embeddings, we demonstrate that sharing parameters in projection layers would enable ADEs to perform competitively with SDEs.

2020

pdf bib
Proceedings of the Fifth Arabic Natural Language Processing Workshop
Imed Zitouni | Muhammad Abdul-Mageed | Houda Bouamor | Fethi Bougares | Mahmoud El-Haj | Nadi Tomeh | Wajdi Zaghouani
Proceedings of the Fifth Arabic Natural Language Processing Workshop

2019

pdf bib
Slot Tagging for Task Oriented Spoken Language Understanding in Human-to-Human Conversation Scenarios
Kunho Kim | Rahul Jha | Kyle Williams | Alex Marin | Imed Zitouni
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Task oriented language understanding (LU) in human-to-machine (H2M) conversations has been extensively studied for personal digital assistants. In this work, we extend the task oriented LU problem to human-to-human (H2H) conversations, focusing on the slot tagging task. Recent advances on LU in H2M conversations have shown accuracy improvements by adding encoded knowledge from different sources. Inspired by this, we explore several variants of a bidirectional LSTM architecture that relies on different knowledge sources, such as Web data, search engine click logs, expert feedback from H2M models, as well as previous utterances in the conversation. We also propose ensemble techniques that aggregate these different knowledge sources into a single model. Experimental evaluation on a four-turn Twitter dataset in the restaurant and music domains shows improvements in the slot tagging F1-score of up to 6.09% compared to existing approaches.

pdf bib
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Wassim El-Hajj | Lamia Hadrich Belguith | Fethi Bougares | Walid Magdy | Imed Zitouni | Nadi Tomeh | Mahmoud El-Haj | Wajdi Zaghouani
Proceedings of the Fourth Arabic Natural Language Processing Workshop

2018

pdf bib
Bag of Experts Architectures for Model Reuse in Conversational Language Understanding
Rahul Jha | Alex Marin | Suvamsh Shivaprasad | Imed Zitouni
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

Slot tagging, the task of detecting entities in input user utterances, is a key component of natural language understanding systems for personal digital assistants. Since each new domain requires a different set of slots, the annotation costs for labeling data for training slot tagging models increases rapidly as the number of domains grow. To tackle this, we describe Bag of Experts (BoE) architectures for model reuse for both LSTM and CRF based models. Extensive experimentation over a dataset of 10 domains drawn from data relevant to our commercial personal digital assistant shows that our BoE models outperform the baseline models with a statistically significant average margin of 5.06% in absolute F1-score when training with 2000 instances per domain, and achieve an even higher improvement of 12.16% when only 25% of the training data is used.

2017

pdf bib
Traitement Automatique des Langues, Volume 58, Numéro 3 : Traitement automatique de l'arabe et des langues apparentées [NLP for Arabic and Related Languages]
Mona Diab | Nizar Habash | Imed Zitouni
Traitement Automatique des Langues, Volume 58, Numéro 3 : Traitement automatique de l'arabe et des langues apparentées [NLP for Arabic and Related Languages]

pdf bib
NLP for Arabic and Related Languages
Mona Diab | Nizar Habash | Imed Zitouni
Traitement Automatique des Langues, Volume 58, Numéro 3 : Traitement automatique de l'arabe et des langues apparentées [NLP for Arabic and Related Languages]

2011

pdf bib
Book Reviews: Introduction to Arabic Natural Language Processing by Nizar Y. Habash
Imed Zitouni
Computational Linguistics, Volume 37, Issue 3 - September 2011

2010

pdf bib
Arabic Named Entity Recognition: Using Features Extracted from Noisy Data
Yassine Benajiba | Imed Zitouni | Mona Diab | Paolo Rosso
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Improving Mention Detection Robustness to Noisy Input
Radu Florian | John Pitrelli | Salim Roukos | Imed Zitouni
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Enhancing Mention Detection Using Projection via Aligned Corpora
Yassine Benajiba | Imed Zitouni
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Arabic Mention Detection: Toward Better Unit of Analysis
Yassine Benajiba | Imed Zitouni
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Arabic Word Segmentation for Better Unit of Analysis
Yassine Benajiba | Imed Zitouni
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The Arabic language has a very rich morphology where a word is composed of zero or more prefixes, a stem and zero or more suffixes. This makes Arabic data sparse compared to other languages, such as English, and consequently word segmentation becomes very important for many Natural Language Processing tasks that deal with the Arabic language. We present in this paper two segmentation schemes that are morphological segmentation and Arabic TreeBank segmentation and we show their impact on an important natural language processing task that is mention detection. Experiments on Arabic TreeBank corpus show 98.1% accuracy on morphological segmentation and 99.4% on morphological segmentation. We also discuss the importance of segmenting the text; experiments show up to 6F points improvement of the mention detection system performance when morphological segmentation is used instead of not segmenting the text. Obtained results also show up to 3F points improvement is achieved when the appropriate segmentation style is used.

2009

pdf bib
Classifier Combination Techniques Applied to Coreference Resolution
Smita Vemulapalli | Xiaoqiang Luo | John F. Pitrelli | Imed Zitouni
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium

2008

pdf bib
When Harry Met Harri: Cross-lingual Name Spelling Normalization
Fei Huang | Ahmad Emami | Imed Zitouni
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Mention Detection Crossing the Language Barrier
Imed Zitouni | Radu Florian
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Violetta Cavalli-Sforza | Imed Zitouni
Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources

2006

pdf bib
Factorizing Complex Models: A Case Study in Mention Detection
Radu Florian | Hongyan Jing | Nanda Kambhatla | Imed Zitouni
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Maximum Entropy Based Restoration of Arabic Diacritics
Imed Zitouni | Jeffrey S. Sorensen | Ruhi Sarikaya
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2005

pdf bib
The Impact of Morphological Stemming on Arabic Mention Detection and Coreference Resolution
Imed Zitouni | Jeffrey Sorensen | Xiaoqiang Luo | Radu Florian
Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

pdf bib
Multi-Lingual Coreference Resolution With Syntactic Features
Xiaoqiang Luo | Imed Zitouni
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
OrienTel - Telephony Databases Across Northern Africa and the Middle East
Dorota Iskra | Rainer Siemund | Jamal Borno | Asuncion Moreno | Ossama Emam | Khalid Choukri | Oren Gedge | Herbert Tropf | Albino Nogueiras | Imed Zitouni | Anastasios Tsopanoglou | Nikos Fakotakis
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf bib
OrienTel - Multilingual access to interactive communication services for the Mediterranean and the Middle East
Rainer Siemund | Barbara Heuft | Khalid Choukri | Ossama Emam | Emmanuel Maragoudakis | Herbert Tropf | Oren Gedge | Sherrie Shammass | Asuncion Moreno | Albino Nogueiras Rodriguez | Imed Zitouni | Dorota Iskra
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)