The Challenge of Performing Ontology-driven Entity Extraction in Real-world Unstructured Textual Data from the Domain of Dementia

Sumaiya Suravee, Carsten Oliver Schmidt, Kristina Yordanova


Abstract
Named entity recognition allows the automated extraction of structured domain-related information from unstructured textual data. Our study explores the task of ontology-driven entity recognition, a sequence labelling process for custom named entity recognition for the domain of dementia, specifically from unstructured forum texts where unprofessional caregivers of people with dementia discuss the challenges they face related to agitation. The targeted corpus is loosely structured, contains ambiguous sentences and vocabulary that does not match the agitation-related medical vocabulary. To address the above challenges, we propose a pipeline that involves the following steps: 1) development of an annotation codebook; 2) annotation of a textual corpus collected from dementia forums, consisting of 45,216 sentences (775 questions and 5571 answers); 3) data augmentation to reduce the imbalance in the corpus; 4) training of a bidirectional LSTM model and a transformer model; 5) comparison of the results with those from few shot- and zero-shot based prompt engineering techniques using a pretrained large language model (LLaMa 3). The results showed that LLaMa 3 was more robust than traditional neural networks and transformer models in detecting underrepresented entities. Furthermore, the study demonstrates that data augmentation improves the entity recognition task when fine-tuning deep learning models. The paper illustrates the challenges of ontology-driven entity recognition in real-world datasets and proposes a roadmap to addressing them that is potentially transferable to other real-world domains.
Anthology ID:
2025.ranlp-1.139
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
1205–1214
Language:
URL:
https://aclanthology.org/2025.ranlp-1.139/
DOI:
Bibkey:
Cite (ACL):
Sumaiya Suravee, Carsten Oliver Schmidt, and Kristina Yordanova. 2025. The Challenge of Performing Ontology-driven Entity Extraction in Real-world Unstructured Textual Data from the Domain of Dementia. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 1205–1214, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
The Challenge of Performing Ontology-driven Entity Extraction in Real-world Unstructured Textual Data from the Domain of Dementia (Suravee et al., RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-1.139.pdf