Muhammed AbuOdeh


2023

pdf bib
CamelParser2.0: A State-of-the-Art Dependency Parser for Arabic
Ahmed Elshabrawy | Muhammed AbuOdeh | Go Inoue | Nizar Habash
Proceedings of ArabicNLP 2023

We present CamelParser2.0, an open-source Python-based Arabic dependency parser targeting two popular Arabic dependency formalisms, the Columbia Arabic Treebank (CATiB), and Universal Dependencies (UD). The CamelParser2.0 pipeline handles the processing of raw text and produces tokenization, part-of-speech and rich morphological features. As part of developing CamelParser2.0, we explore many system design hyper-parameters, such as parsing model architecture and pretrained language model selection, achieving new state-of-the-art performance across diverse Arabic genres under gold and predicted tokenization settings.

2022

pdf bib
Camel Treebank: An Open Multi-genre Arabic Dependency Treebank
Nizar Habash | Muhammed AbuOdeh | Dima Taji | Reem Faraj | Jamila El Gizuli | Omar Kallas
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present the Camel Treebank (CAMELTB), a 188K word open-source dependency treebank of Modern Standard and Classical Arabic. CAMELTB 1.0 includes 13 sub-corpora comprising selections of texts from pre-Islamic poetry to social media online commentaries, and covering a range of genres from religious and philosophical texts to news, novels, and student essays. The texts are all publicly available (out of copyright, creative commons, or under open licenses). The texts were morphologically tokenized and syntactically parsed automatically, and then manually corrected by a team of trained annotators. The annotations follow the guidelines of the Columbia Arabic Treebank (CATiB) dependency representation. We discuss our annotation process and guideline extensions, and we present some initial observations on lexical and syntactic differences among the annotated sub-corpora. This corpus will be publicly available to support and encourage research on Arabic NLP in general and on new, previously unexplored genres that are of interest to a wider spectrum of researchers, from historical linguistics and digital humanities to computer-assisted language pedagogy.