Ossama Obeid


2020

pdf bib
CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing
Ossama Obeid | Nasser Zalmout | Salam Khalifa | Dima Taji | Mai Oudah | Bashar Alhafni | Go Inoue | Fadhl Eryani | Alexander Erdmann | Nizar Habash
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python. CAMeL Tools currently provides utilities for pre-processing, morphological modeling, Dialect Identification, Named Entity Recognition and Sentiment Analysis. In this paper, we describe the design of CAMeL Tools and the functionalities it provides.

2019

pdf bib
ADIDA: Automatic Dialect Identification for Arabic
Ossama Obeid | Mohammad Salameh | Houda Bouamor | Nizar Habash
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

This demo paper describes ADIDA, a web-based system for automatic dialect identification for Arabic text. The system distinguishes among the dialects of 25 Arab cities (from Rabat to Muscat) in addition to Modern Standard Arabic. The results are presented with either a point map or a heat map visualizing the automatic identification probabilities over a geographical map of the Arab World.

2018

pdf bib
An Arabic Morphological Analyzer and Generator with Copious Features
Dima Taji | Salam Khalifa | Ossama Obeid | Fadhl Eryani | Nizar Habash
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology

We introduce CALIMA-Star, a very rich Arabic morphological analyzer and generator that provides functional and form-based morphological features as well as built-in tokenization, phonological representation, lexical rationality and much more. This tool includes a fast engine that can be easily integrated into other systems, as well as an easy-to-use API and a web interface. CALIMA-Star also supports morphological reinflection. We evaluate CALIMA-Star against four commonly used analyzers for Arabic in terms of speed and morphological content.

pdf bib
MADARi: A Web Interface for Joint Arabic Morphological Annotation and Spelling Correction
Ossama Obeid | Salam Khalifa | Nizar Habash | Houda Bouamor | Wajdi Zaghouani | Kemal Oflazer
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
The MADAR Arabic Dialect Corpus and Lexicon
Houda Bouamor | Nizar Habash | Mohammad Salameh | Wajdi Zaghouani | Owen Rambow | Dana Abdulrahim | Ossama Obeid | Salam Khalifa | Fadhl Eryani | Alexander Erdmann | Kemal Oflazer
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Morphologically Annotated Corpus of Emirati Arabic
Salam Khalifa | Nizar Habash | Fadhl Eryani | Ossama Obeid | Dana Abdulrahim | Meera Al Kaabi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation
Wajdi Zaghouani | Nizar Habash | Ossama Obeid | Behrang Mohit | Houda Bouamor | Kemal Oflazer
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic. Our overarching goal is to use the annotated corpus to develop automatic machine translation post-editing systems for Arabic that can be used to help accelerate the human revision process of translated texts. The creation of any manually annotated corpus usually presents many challenges. In order to address these challenges, we created comprehensive and simplified annotation guidelines which were used by a team of five annotators and one lead annotator. In order to ensure a high annotation agreement between the annotators, multiple training sessions were held and regular inter-annotator agreement measures were performed to check the annotation quality. The created corpus of manual post-edited translations of English to Arabic articles is the largest to date for this language pair.

pdf bib
Guidelines and Framework for a Large Scale Arabic Diacritized Corpus
Wajdi Zaghouani | Houda Bouamor | Abdelati Hawwari | Mona Diab | Ossama Obeid | Mahmoud Ghoneim | Sawsan Alqahtani | Kemal Oflazer
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents the annotation guidelines developed as part of an effort to create a large scale manually diacritized corpus for various Arabic text genres. The target size of the annotated corpus is 2 million words. We summarize the guidelines and describe issues encountered during the training of the annotators. We also discuss the challenges posed by the complexity of the Arabic language and how they are addressed. Finally, we present the diacritization annotation procedure and detail the quality of the resulting annotations.

2015

pdf bib
The Second QALB Shared Task on Automatic Text Correction for Arabic
Alla Rozovskaya | Houda Bouamor | Nizar Habash | Wajdi Zaghouani | Ossama Obeid | Behrang Mohit
Proceedings of the Second Workshop on Arabic Natural Language Processing

pdf bib
A Pilot Study on Arabic Multi-Genre Corpus Diacritization
Houda Bouamor | Wajdi Zaghouani | Mona Diab | Ossama Obeid | Kemal Oflazer | Mahmoud Ghoneim | Abdelati Hawwari
Proceedings of the Second Workshop on Arabic Natural Language Processing

2014

pdf bib
Large Scale Arabic Error Annotation: Guidelines and Framework
Wajdi Zaghouani | Behrang Mohit | Nizar Habash | Ossama Obeid | Nadi Tomeh | Alla Rozovskaya | Noura Farra | Sarah Alkuhlani | Kemal Oflazer
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present annotation guidelines and a web-based annotation framework developed as part of an effort to create a manually annotated Arabic corpus of errors and corrections for various text types. Such a corpus will be invaluable for developing Arabic error correction tools, both for training models and as a gold standard for evaluating error correction algorithms. We summarize the guidelines we created. We also describe issues encountered during the training of the annotators, as well as problems that are specific to the Arabic language that arose during the annotation process. Finally, we present the annotation tool that was developed as part of this project, the annotation pipeline, and the quality of the resulting annotations.

pdf bib
The First QALB Shared Task on Automatic Text Correction for Arabic
Behrang Mohit | Alla Rozovskaya | Nizar Habash | Wajdi Zaghouani | Ossama Obeid
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

2013

pdf bib
A Web-based Annotation Framework For Large-Scale Text Correction
Ossama Obeid | Wajdi Zaghouani | Behrang Mohit | Nizar Habash | Kemal Oflazer | Nadi Tomeh
The Companion Volume of the Proceedings of IJCNLP 2013: System Demonstrations