We present an overview of the FIGNEWSshared task, organized as part of the Arabic-NLP 2024 conference co-located with ACL2024. The shared task addresses bias and pro-paganda annotation in multilingual news posts.We focus on the early days of the Israel War onGaza as a case study. The task aims to fostercollaboration in developing annotation guide-lines for subjective tasks by creating frame-works for analyzing diverse narratives high-lighting potential bias and propaganda. In aspirit of fostering and encouraging diversity,we address the problem from a multilingualperspective, namely within five languages: En-glish, French, Arabic, Hebrew, and Hindi. Atotal of 17 teams participated in two annota-tion subtasks: bias (16 teams) and propaganda(6 teams). The teams competed in four evalua-tion tracks: guidelines development, annotationquality, annotation quantity, and consistency.Collectively, the teams produced 129,800 datapoints. Key findings and implications for thefield are discussed.
We present Palmyra 3.0, a cloud-based, configurable, and user-friendly platform for morphology and syntax annotation through dependency-tree visualization. Palmyra 3.0 implements a robust system that stores data on the cloud. By default, Palmyra 3.0 comes with an Arabic dependency parser that generates highly accurate trees, but it is easily configurable to support dependency parsers in other languages. Palmyra 3.0 provides default configuration files for a number of predefined formalisms, such as UD and CATiB, and a number of user-friendly features to support annotators.
We present CamelParser2.0, an open-source Python-based Arabic dependency parser targeting two popular Arabic dependency formalisms, the Columbia Arabic Treebank (CATiB), and Universal Dependencies (UD). The CamelParser2.0 pipeline handles the processing of raw text and produces tokenization, part-of-speech and rich morphological features. As part of developing CamelParser2.0, we explore many system design hyper-parameters, such as parsing model architecture and pretrained language model selection, achieving new state-of-the-art performance across diverse Arabic genres under gold and predicted tokenization settings.
We present the Camel Treebank (CAMELTB), a 188K word open-source dependency treebank of Modern Standard and Classical Arabic. CAMELTB 1.0 includes 13 sub-corpora comprising selections of texts from pre-Islamic poetry to social media online commentaries, and covering a range of genres from religious and philosophical texts to news, novels, and student essays. The texts are all publicly available (out of copyright, creative commons, or under open licenses). The texts were morphologically tokenized and syntactically parsed automatically, and then manually corrected by a team of trained annotators. The annotations follow the guidelines of the Columbia Arabic Treebank (CATiB) dependency representation. We discuss our annotation process and guideline extensions, and we present some initial observations on lexical and syntactic differences among the annotated sub-corpora. This corpus will be publicly available to support and encourage research on Arabic NLP in general and on new, previously unexplored genres that are of interest to a wider spectrum of researchers, from historical linguistics and digital humanities to computer-assisted language pedagogy.