Automatic Detection and Labelling of Personal Data in Case Reports from the ECHR in Spanish: Evaluation of Two Different Annotation Approaches

Maria Sierro, Begoña Altuna, Itziar Gonzalez-Dios


Abstract
In this paper we evaluate two annotation approaches for automatic detection and labelling of personal information in legal texts in relation to the ambiguity of the labels and the homogeneity of the annotations. For this purpose, we built a corpus of 44 case reports from the European Court of Human Rights in Spanish language and we annotated it following two different annotation approaches: automatic projection of the annotations of an existing English corpus, and manual annotation with our reinterpretation of their guidelines. Moreover, we employ Flair on a Named Entity Recognition task to compare its performance in the two annotation schemes.
Anthology ID:
2024.caldpseudo-1.3
Volume:
Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024)
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Elena Volodina, David Alfter, Simon Dobnik, Therese Lindström Tiedemann, Ricardo Muñoz Sánchez, Maria Irena Szawerna, Xuan-Son Vu
Venues:
CALD-pseudo | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18–24
Language:
URL:
https://aclanthology.org/2024.caldpseudo-1.3
DOI:
Bibkey:
Cite (ACL):
Maria Sierro, Begoña Altuna, and Itziar Gonzalez-Dios. 2024. Automatic Detection and Labelling of Personal Data in Case Reports from the ECHR in Spanish: Evaluation of Two Different Annotation Approaches. In Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024), pages 18–24, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
Automatic Detection and Labelling of Personal Data in Case Reports from the ECHR in Spanish: Evaluation of Two Different Annotation Approaches (Sierro et al., CALD-pseudo-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.caldpseudo-1.3.pdf