pdf
bib
Proceedings of the first International Workshop on Nakba Narratives as Language Resources
Mustafa Jarrar
|
Habash Habash
|
Mo El-Haj
pdf
bib
abs
Deciphering Implicatures: On NLP and Oral Testimonies
Zainab Sabra
The utterance of a word does not intrinsically convey its intended force. The semantic of utterances is not shaped by the precise references of the words used. Asserting that “it is shameful to abandon our country” does not merely convey information; rather, it asserts an act of resilience. In most of our exchanges, we rarely utilize sentences to describe reality or the world around us. More frequently, our statements aim to express opinions, to influence, or be influenced by others. Words carry more than just their syntax and semantics; they also embody a pragmatic normative force. This divergence between literal and conveyed meaning was depicted in the literature of philosophy of language as the difference between sentence meaning and speaker meaning. Where the former is the literal understanding of the words combined in a sentence, the latter is what the speaker is trying to convey through her expression. In order to derive the speaker meaning from the sentence meaning, J.L. Austin (the author of How To Do Things with Words) relied on conventions, whereas H.P. Grice (the author of Logic and Conversations) relied on conventional and non conventional implicatures. This paper aims to decipher how we can infer speaker meaning from sentence meaning and thereby capture the force of what has been articulated, focusing specifically on oral testimonies. I argue that oral testimonies are forms of speech acts that aim to produce normative changes. Following this discussion, I will examine various natural language processing (NLP) models that make explicit what is implicit in oral testimonies with its benefits and limitations. Lastly, I will address two challenges, the former is related to implicatures that are not governed by conventions and the latter is concerned with the biases inherent in hermeneutical approaches.
pdf
bib
abs
A cultural shift in Western perceptions of Palestine
Terry Regier
|
Muhammad Ali Khalidi
We argue that a cultural shift in Western perceptions of Palestine began in the late 1990s to 2000s, leading to increased openness to Palestinian perspectives, including awareness of the Nakba. We present 3 computational analyses designed to test this idea against data from the 2020 Google Books English dataset. The results support the claim of a cultural shift, and help to characterize that shift.
pdf
bib
abs
Cognitive Geographies of Catastrophe Narratives: Georeferenced Interview Transcriptions as Language Resource for Models of Forced Displacement
Annie K. Lamar
|
Rick Castle
|
Carissa Chappell
|
Emmanouela Schoinoplokaki
|
Allene M. Seet
|
Amit Shilo
|
Chloe Nahas
We present a machine-understandable geotagged dataset of translated interviews from the Nakba Archive alongside a complete georeferenced dataset of named locations mentioned in the interviews. In a preliminary analysis of this dataset, we find that the cognitive relationship of interviewees to place and spatiality is significantly correlated with gender. Our data also shows that interviewees with birthplaces depopulated in the 1948 Nakba incorporate references to named places in their interviews in substantially different ways than other interviewees. This suggests that the status of the interviewee’s birthplace may impact the way they narrate their experiences. Our work serves as a foundation for continued and expanded statistical and cognitive models of Palestinian forced displacement.
pdf
bib
abs
Sentiment Analysis of Nakba Oral Histories: A Critical Study of Large Language Models
Huthaifa I. Ashqar
This study explores the use of Large Language Models (LLMs), specifically ChatGPT, for sentiment analysis of Nakba oral histories, which document the experiences of Palestinian refugees. The study compares sentiment analysis results from full testimonies (average 2500 words) and their summarized versions (300 words). The findings reveal that summarization increased positive sentiment and decreased negative sentiment, suggesting that the process may highlight more hopeful themes while oversimplifying emotional complexities. The study highlights both the potential and limitations of using LLMs for analyzing sensitive, trauma-based narratives and calls for further research to improve sentiment analysis in such contexts.
pdf
bib
abs
The Nakba Lexicon: Building a Comprehensive Dataset from Palestinian Literature
Izza AbuHaija
|
Salim Al Mandhari
|
Mo El-Haj
|
Jonas Sibony
|
Paul Rayson
This paper introduces the Nakba Lexicon, a comprehensive dataset derived from the poetry collection Asifa ‘Ala al-Iz‘aj (Sorry for the Disturbance) by Istiqlal Eid, a Palestinian poet from El-Birweh. Eid’s work poignantly reflects on themes of Palestinian identity, displacement, and resilience, serving as a resource for preserving linguistic and cultural heritage in the context of post-Nakba literature. The dataset is structured into ten thematic domains, including political terminology, memory and preservation, sensory and emotional lexicon, toponyms, nature, and external linguistic influences such as Hebrew, French, and English, thereby capturing the socio-political, emotional, and cultural dimensions of the Nakba. The Nakba Lexicon uniquely emphasises the contributions of women to Palestinian literary traditions, shedding light on often-overlooked narratives of resilience and cultural continuity. Advanced Natural Language Processing (NLP) techniques were employed to analyse the dataset, with fine-tuned pre-trained models such as ARABERT and MARBERT achieving F1-scores of 0.87 and 0.68 in language and lexical classification tasks, respectively, significantly outperforming traditional machine learning models. These results highlight the potential of domain-specific computational models to effectively analyse complex datasets, facilitating the preservation of marginalised voices. By bridging computational methods with cultural preservation, this study enhances the understanding of Palestinian linguistic heritage and contributes to broader efforts in documenting and analysing endangered narratives. The Nakba Lexicon paves the way for future interdisciplinary research, showcasing the role of NLP in addressing historical trauma, resilience, and cultural identity.
pdf
bib
abs
Arabic Topic Classification Corpus of the Nakba Short Stories
Osama Hamed
|
Nadeem Zaidkilani
In this paper, we enrich Arabic Natural Language Processing (NLP) resources by introducing the “Nakba Topic Classification Corpus (NTCC),” a novel annotated Arabic corpus derived from narratives about the Nakba. The NTCC comprises approximately 470 sentences extracted from eight short stories and captures the thematic depth of the Nakba narratives, providing insights into both historical and personal dimensions. The corpus was annotated in a two-step process. One third of the dataset was manually annotated, achieving an IAA of 87% (later resolved to 100%), while the rest was annotated using a rule-based system based on thematic patterns. This approach ensures consistency and reproducibility, enhancing the corpus’s reliability for NLP research. The NTCC contributes to the preservation of the Palestinian cultural heritage while addressing key challenges in Arabic NLP, such as data scarcity and linguistic complexity. By like topic modeling and classification tasks, the NTCC offers a valuable resource for advancing Arabic NLP research and fostering a deeper understanding of the Nakba narratives
pdf
bib
abs
Exploring Author Style in Nakba Short Stories: A Comparative Study of Transformer-Based Models
Osama Hamed
|
Nadeem Zaidkilani
Measuring semantic similarity and analyzing authorial style are fundamental tasks in Natural Language Processing (NLP), with applications in text classification, cultural analysis, and literary studies. This paper investigates the semantic similarity and stylistic features of Nakba short stories, a key component of Palestinian literature, using transformer-based models, AraBERT, BERT, and RoBERTa. The models effectively capture nuanced linguistic structures, cultural contexts, and stylistic variations in Arabic narratives, outperforming the traditional TF-IDF baseline. By comparing stories of similar length, we minimize biases and ensure a fair evaluation of both semantic and stylistic relationships. Experimental results indicate that RoBERTa achieves slightly higher performance, highlighting its ability to distinguish subtle stylistic patterns. This study demonstrates the potential of AI-driven tools to provide more in-depth insights into Arabic literature, and contributes to the systematic analysis of both semantic and stylistic elements in Nakba narratives.
pdf
bib
abs
Detecting Inconsistencies in Narrative Elements of Cross Lingual Nakba Texts
Nada Hamarsheh
|
Zahia Elabour
|
Aya Murra
|
Adnan Yahya
This paper suggests a methodology for contradiction detection in cross lingual texts about the Nakba. We propose a pipeline that includes text translation using Google’s Gemini for context-aware translations, followed by a fact extraction task using either Gemini or the TextRank algorithm. We then apply Natural Language Inference (NLI) by using models trained for this task, such as XLM-RoBERTa and BART to detect contradictions from different texts about the Nakba. We also describe how the performance of such NLI models is affected by the complexity of some sentences as well as the unique syntactic and semantic characteristics of the Arabic language. Additionally, we introduce a method using cosine similarity of vector embeddings of facts for identifying missing or underrepresented topics among historical narrative texts. The approach we propose in this paper provides insights into biases, contradictions, and gaps in narratives surrounding the Nakba, offering a deeper understanding of historical perspectives.
pdf
bib
abs
Multilingual Propaganda Detection: Exploring Transformer-Based Models mBERT, XLM-RoBERTa, and mT5
Mohamed Ibrahim Ragab
|
Ensaf Hussein Mohamed
|
Walaa Medhat
This research investigates multilingual propaganda detection by employing transformer-based models, specifically mBERT, XLM-RoBERTa, and mT5. The study utilizes a balanced dataset from the BiasFigNews corpus, annotated for propaganda and bias across five languages. The models were finely tuned to generate embeddings for classification tasks. The evaluation revealed mT5 as the most effective model, achieving an accuracy of 99.61% and an F1-score of 0.9961, followed by mBERT and XLM-RoBERTa with accuracies of 92% and 91.41%, respectively. The findings demonstrate the efficacy of transformer-based embeddings in detecting propaganda while also highlighting challenges in subtle class distinctions. Future work aims to enhance cross-lingual adaptability and explore lightweight models for resource-constrained settings.
pdf
bib
abs
Collective Memory and Narrative Cohesion: A Computational Study of Palestinian Refugee Oral Histories in Lebanon
Ghadir A. Awad
|
Tamara N. Rayan
|
Lavinia Dunagan
|
David Gamba
This study uses the Palestinian Oral History Archive (POHA) to investigate how Palestinian refugee groups in Lebanon sustain a cohesive collective memory of the Nakba through shared narratives. Grounded in Halbwachs’ theory of group memory, we employ statistical analysis of pairwise similarity of narratives, focusing on the influence of shared gender and location. We use textual representation and semantic embeddings of narratives to represent the interviews themselves. Our analysis demonstrates that shared origin is a powerful determinant of narrative similarity across thematic keywords, landmarks, and significant figures, as well as in semantic embeddings of the narratives. Meanwhile, shared residence fosters cohesion, with its impact significantly amplified when paired with shared origin. Additionally, women’s narratives exhibit heightened thematic cohesion, particularly in recounting experiences of the British occupation, underscoring the gendered dimensions of memory formation. This research deepens the understanding of collective memory in diasporic settings, emphasizing the critical role of oral histories in safeguarding Palestinian identity and resisting erasure.
pdf
bib
abs
The Missing Cause: An Analysis of Causal Attributions in Reporting on Palestine
Paulina Garcia Corral
|
Hannah Bechara
|
Krishnamoorthy Manohara
|
Slava Jankin
Missing cause bias is a specific type of bias in media reporting that relies on consistently omitting causal attribution to specific events, for example when omitting specific actors as causes of incidents. Identifying these patterns in news outlets can be helpful in assessing the level of bias present in media content. In this paper, we examine the prevalence of this bias in reporting on Palestine by identifying causal constructions in headlines. We compare headlines from three main news media outlets: CNN, the BBC, and AJ (AlJazeera), that cover the Israel-Palestine conflict. We also collect and compare these findings to data related to the Ukraine-Russia war to analyze editorial style within press organizations. We annotate a subset of this data and evaluate two causal language models (UniCausal and GPT-4o) for the identification and extraction of causal language in news headlines. Using the top performing model, GPT-4o, we machine annotate the full corpus and analyze missing bias prevalence within and across news organizations. Our findings reveal that BBC headlines tend to avoid directly attributing causality to Israel for the violence in Gaza, both when compared to other news outlets, and to its own reporting on other conflicts.
pdf
bib
abs
Bias Detection in Media: Traditional Models vs. Transformers in Analyzing Social Media Coverage of the Israeli-Gaza Conflict
Marryam Yahya Mohammed
|
Esraa Ismail Mohamed
|
Mariam Nabil Esmat
|
Yomna Ashraf Nagib
|
Nada Ahmed Radwan
|
Ziad Mohamed Elshaer
|
Ensaf Hussein Mohamed
Bias in news reporting significantly influences public perception, particularly in sensitive and polarized contexts like the Israel-Gaza conflict. Detecting bias in such cases presents unique challenges due to political, cultural, and ideological complexities, often amplifying disparities in reporting. While prior research has addressed media bias and dataset fairness, these approaches inadequately capture the nuanced dynamics of the Israel-Gaza conflict. To address this gap, we propose an NLP-based framework that leverages Nakba narratives as linguistic resources for bias detection in news coverage. Using a multilingual corpus focusing on Arabic texts, we apply rigorous data cleaning, pre-processing, and methods to mitigate imbalanced class distributions that could skew classification outcomes. Our study explores various approaches, including Machine Learning (ML), Deep Learning (DL), Transformer-based architectures, and generative models. The findings demonstrate promising advancements in automating bias detection, and enhancing fairness and accuracy in politically sensitive reporting.
pdf
bib
abs
NakbaTR: A Turkish NER Dataset for Nakba Narratives
Esma Fatıma Bilgin Tasdemir
|
Şaziye Betül Özateş
This paper introduces a novel, annotated Named Entity Recognition (NER) dataset derived from a collection of 181 news articles about the Nakba and its witnesses. Given their prominence as a primary source of information on the Nakba in Turkish, news articles were selected as the primary data source. Some 4,032 news sentences are collected from web sites of two news agencies, Anadolu Ajansı and TRTHaber. We applied a filtering process to make sure that only the news which contain witness testimonies regarding the ongoing Nakba are included in the dataset. After a semi-automatic annotation for entities of type Person, Location, and Organization, we obtained a NER dataset of 2,289 PERSON, 5,875 LOCATION, and 1,299 ORGANIZATION tags. We expect the dataset to be useful in several NLP tasks such as sentiment analysis and relation extraction for Nakba event while providing a new language resource for Turkish. As a future work, we aim to improve the dataset by increasing the number of news and entity types.
pdf
bib
abs
Integrating Argumentation Features for Enhanced Propaganda Detection in Arabic Narratives on the Israeli War on Gaza
Sara Nabhani
|
Claudia Borg
|
Khalid Al Khatib
|
Kurt Micallef
Propaganda significantly shapes public opinion, especially in conflict-driven contexts like the Israeli-Palestinian conflict. This study explores the integration of argumentation features, such as claims, premises, and major claims, into machine learning models to enhance the detection of propaganda techniques in Arabic media. By leveraging datasets annotated with fine-grained propaganda techniques and employing crosslingual and multilingual NLP methods, along with GPT-4-based annotations, we demonstrate consistent performance improvements. A qualitative analysis of Arabic media narratives on the Israeli war on Gaza further reveals the model’s capability to identify diverse rhetorical strategies, offering insights into the dynamics of propaganda. These findings emphasize the potential of combining NLP with argumentation features to foster transparency and informed discourse in politically charged settings.