Nadeem Zaidkilani


2025

pdf bib
Arabic Topic Classification Corpus of the Nakba Short Stories
Osama Hamed | Nadeem Zaidkilani
Proceedings of the first International Workshop on Nakba Narratives as Language Resources

In this paper, we enrich Arabic Natural Language Processing (NLP) resources by introducing the “Nakba Topic Classification Corpus (NTCC),” a novel annotated Arabic corpus derived from narratives about the Nakba. The NTCC comprises approximately 470 sentences extracted from eight short stories and captures the thematic depth of the Nakba narratives, providing insights into both historical and personal dimensions. The corpus was annotated in a two-step process. One third of the dataset was manually annotated, achieving an IAA of 87% (later resolved to 100%), while the rest was annotated using a rule-based system based on thematic patterns. This approach ensures consistency and reproducibility, enhancing the corpus’s reliability for NLP research. The NTCC contributes to the preservation of the Palestinian cultural heritage while addressing key challenges in Arabic NLP, such as data scarcity and linguistic complexity. By like topic modeling and classification tasks, the NTCC offers a valuable resource for advancing Arabic NLP research and fostering a deeper understanding of the Nakba narratives

pdf bib
Exploring Author Style in Nakba Short Stories: A Comparative Study of Transformer-Based Models
Osama Hamed | Nadeem Zaidkilani
Proceedings of the first International Workshop on Nakba Narratives as Language Resources

Measuring semantic similarity and analyzing authorial style are fundamental tasks in Natural Language Processing (NLP), with applications in text classification, cultural analysis, and literary studies. This paper investigates the semantic similarity and stylistic features of Nakba short stories, a key component of Palestinian literature, using transformer-based models, AraBERT, BERT, and RoBERTa. The models effectively capture nuanced linguistic structures, cultural contexts, and stylistic variations in Arabic narratives, outperforming the traditional TF-IDF baseline. By comparing stories of similar length, we minimize biases and ensure a fair evaluation of both semantic and stylistic relationships. Experimental results indicate that RoBERTa achieves slightly higher performance, highlighting its ability to distinguish subtle stylistic patterns. This study demonstrates the potential of AI-driven tools to provide more in-depth insights into Arabic literature, and contributes to the systematic analysis of both semantic and stylistic elements in Nakba narratives.