Alejandro Sierra Múnera

Also published as: Alejandro Sierra-Múnera


2024

pdf bib
The Effects of Data Quality on Named Entity Recognition
Divya Bhadauria | Alejandro Sierra Múnera | Ralf Krestel
Proceedings of the Ninth Workshop on Noisy and User-generated Text (W-NUT 2024)

The extraction of valuable information from the vast amount of digital data available today has become increasingly important, making Named Entity Recognition models an essential component of information extraction tasks. This emphasizes the importance of understanding the factors that can compromise the performance of these models. Many studies have examined the impact of data annotation errors on NER models, leaving the broader implication of overall data quality on these models unexplored. In this work, we evaluate the robustness of three prominent NER models on datasets with varying amounts of textual noise types. The results show that as the noise in the dataset increases, model performance declines, with a minor impact for some noise types and a significant drop in performance for others. The findings of this research can be used as a foundation for building robust NER systems by enhancing dataset quality beforehand.

2021

pdf bib
Did You Enjoy the Last Supper? An Experimental Study on Cross-Domain NER Models for the Art Domain
Alejandro Sierra-Múnera | Ralf Krestel
Proceedings of the Workshop on Natural Language Processing for Digital Humanities

Named entity recognition (NER) is an important task that constitutes the basis for multiple downstream natural language processing tasks. Traditional machine learning approaches for NER rely on annotated corpora. However, these are only largely available for standard domains, e.g., news articles. Domain-specific NER often lacks annotated training data and therefore two options are of interest: expensive manual annotations or transfer learning. In this paper, we study a selection of cross-domain NER models and evaluate them for use in the art domain, particularly for recognizing artwork titles in digitized art-historic documents. For the evaluation of the models, we employ a variety of source domain datasets and analyze how each source domain dataset impacts the performance of the different models for our target domain. Additionally, we analyze the impact of the source domain’s entity types, looking for a better understanding of how the transfer learning models adapt different source entity types into our target entity types.