Lina Duaibes
2024
Event-Arguments Extraction Corpus and Modeling using BERT for Arabic
Alaa Aljabari
|
Lina Duaibes
|
Mustafa Jarrar
|
Mohammed Khalilia
Proceedings of The Second Arabic Natural Language Processing Conference
Event-argument extraction is a challenging task, particularly in Arabic due to sparse linguistic resources. To fill this gap, we introduce the corpus (550k tokens) as an extension of Wojood, enriched with event-argument annotations. We used three types of event arguments: agent, location, and date, which we annotated as relation types. Our inter-annotator agreement evaluation resulted in 82.23% Kappa score and 87.2% F1-score. Additionally, we propose a novel method for event relation extraction using BERT, in which we treat the task as text entailment. This method achieves an F1-score of 94.01%.To further evaluate the generalization of our proposed method, we collected and annotated another out-of-domain corpus (about 80k tokens) called and used it as a second test set, on which our approach achieved promising results (83.59% F1-score). Last but not least, we propose an end-to-end system for event-arguments extraction. This system is implemented as part of SinaTools, and both corpora are publicly available at https://sina.birzeit.edu/wojood
Sina at FigNews 2024: Multilingual Datasets Annotated with Bias and Propaganda.
Lina Duaibes
|
Areej Jaber
|
Mustafa Jarrar
|
Ahmad Qadi
|
Mais Qandeel
Proceedings of The Second Arabic Natural Language Processing Conference
The proliferation of bias and propaganda onsocial media is an increasingly significant concern,leading to the development of techniquesfor automatic detection. This article presents amultilingual corpus of 12, 000 Facebook postsfully annotated for bias and propaganda. Thecorpus was created as part of the FigNews2024 Shared Task on News Media Narrativesfor framing the Israeli War on Gaza. It coversvarious events during the War from October7, 2023 to January 31, 2024. The corpuscomprises 12, 000 posts in five languages (Arabic,Hebrew, English, French, and Hindi), with2, 400 posts for each language. The annotationprocess involved 10 graduate students specializingin Law. The Inter-Annotator Agreement(IAA) was used to evaluate the annotationsof the corpus, with an average IAA of 80.8%for bias and 70.15% for propaganda annotations.Our team was ranked among the bestperformingteams in both Bias and Propagandasubtasks. The corpus is open-source and availableat https://sina.birzeit.edu/fada
Search
Fix data
Co-authors
- Mustafa Jarrar 2
- Alaa Aljabari 1
- Areej Jaber 1
- Mohammed Khalilia 1
- Ahmad Qadi 1
- show all...