NakbaTR: A Turkish NER Dataset for Nakba Narratives

Esma Fatıma Bilgin Tasdemir, Şaziye Betül Özateş


Abstract
This paper introduces a novel, annotated Named Entity Recognition (NER) dataset derived from a collection of 181 news articles about the Nakba and its witnesses. Given their prominence as a primary source of information on the Nakba in Turkish, news articles were selected as the primary data source. Some 4,032 news sentences are collected from web sites of two news agencies, Anadolu Ajansı and TRTHaber. We applied a filtering process to make sure that only the news which contain witness testimonies regarding the ongoing Nakba are included in the dataset. After a semi-automatic annotation for entities of type Person, Location, and Organization, we obtained a NER dataset of 2,289 PERSON, 5,875 LOCATION, and 1,299 ORGANIZATION tags. We expect the dataset to be useful in several NLP tasks such as sentiment analysis and relation extraction for Nakba event while providing a new language resource for Turkish. As a future work, we aim to improve the dataset by increasing the number of news and entity types.
Anthology ID:
2025.nakbanlp-1.13
Volume:
Proceedings of the first International Workshop on Nakba Narratives as Language Resources
Month:
January
Year:
2025
Address:
Abu Dhabi
Editors:
Mustafa Jarrar, Habash Habash, Mo El-Haj
Venues:
NakbaNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
122–126
Language:
URL:
https://aclanthology.org/2025.nakbanlp-1.13/
DOI:
Bibkey:
Cite (ACL):
Esma Fatıma Bilgin Tasdemir and Şaziye Betül Özateş. 2025. NakbaTR: A Turkish NER Dataset for Nakba Narratives. In Proceedings of the first International Workshop on Nakba Narratives as Language Resources, pages 122–126, Abu Dhabi. Association for Computational Linguistics.
Cite (Informal):
NakbaTR: A Turkish NER Dataset for Nakba Narratives (Bilgin Tasdemir & Özateş, NakbaNLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.nakbanlp-1.13.pdf