Weakly Supervised Named Entity Recognition for Historical Texts

Marco Sorbi, Laurent Moccozet, Stephane Marchand-Maillet


Abstract
Named Entity Recognition has emerged as a critical task in natural language processing, particularly for extracting meaningful information from unstructured text. Although traditional approaches rely heavily on large annotated datasets, recent advances have explored weak supervision techniques to address the limitations of resource-intensive annotation processes. Historical texts provide unique challenges to this task because of their linguistic peculiarities, and several approaches exist to address texts of this domain in a supervised way, but they involve lengthy manual annotations of the documents of interest by domain experts. To address this issue, this paper explores how recent weakly supervised NER techniques can be adapted to historical texts, analyzing their suitability for this domain. The experiments show that domain-specific architectures can be effectively trained on low-resource corpora with weak supervision over a small set of entity labels. Using only 10% of the annotations, the performance of these architectures remains above 80% of the supervised quality in terms of F1-Score.
Anthology ID:
2026.latechclfl-1.6
Volume:
Proceedings of the 10th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Diego Alves, Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Janis Pagel, Stan Szpakowicz
Venues:
LaTeCH-CLfL | WS
SIG:
SIGHUM
Publisher:
Association for Computational Linguistics
Note:
Pages:
48–65
Language:
URL:
https://aclanthology.org/2026.latechclfl-1.6/
DOI:
Bibkey:
Cite (ACL):
Marco Sorbi, Laurent Moccozet, and Stephane Marchand-Maillet. 2026. Weakly Supervised Named Entity Recognition for Historical Texts. In Proceedings of the 10th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature 2026, pages 48–65, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Weakly Supervised Named Entity Recognition for Historical Texts (Sorbi et al., LaTeCH-CLfL 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.latechclfl-1.6.pdf
Supplementarymaterial:
 2026.latechclfl-1.6.SupplementaryMaterial.txt
Supplementarymaterial:
 2026.latechclfl-1.6.SupplementaryMaterial.zip