Recognizing the Structure and Content of Hungarian Civil Registers

Kata Ágnes Szűcs, Noémi Vadász, Zsolt Béla Záros


Abstract
The study evaluates key steps in a system for processing data from digitized Hungarian state register records (1895-1980) into an SQL database. It examines how template selection and post-processing impact data accessibility and integration. The research details the compiled datasets, annotation processes, and evaluation functions used to measure processing quality, emphasizing template selection and post-processing to improve the overall workflow and the accuracy of the published data. An evaluation method for publishing structured data provides a model for similar projects.
Anthology ID:
2025.ranlp-1.140
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
1215–1223
Language:
URL:
https://aclanthology.org/2025.ranlp-1.140/
DOI:
Bibkey:
Cite (ACL):
Kata Ágnes Szűcs, Noémi Vadász, and Zsolt Béla Záros. 2025. Recognizing the Structure and Content of Hungarian Civil Registers. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 1215–1223, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Recognizing the Structure and Content of Hungarian Civil Registers (Szűcs et al., RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-1.140.pdf