Exploring automatic terminology extraction from historical medical data

Leonardo Zilio, Maria José Bocorny Finatto


Abstract
This paper analyzes the performance of several terminology extraction methods when confronted with historical specialized texts that do not conform with modern orthographical norms. We tested two extraction methods based on linguistic patterns, four prompt-based generative artificial intelligence (GenAI) models, and one BERT-like model. Some of these models went through fine-tuning for terminology extraction, and one of these is specialized in the extraction of medical terms from documents written in Portuguese. For the GenAI models, we tested four different prompting strategies. As test set, we used chapter fifteen of the second part of the book Aviso ’a Gente do Mar sobre a sua Saude [Advice to Sea People about their Health], originally written in French by G. Mauran at the end of the 18th century, and translated and adapted to Portuguese in 1794. The chapter was annotated with terminology, and the evaluation was conducted independently both in terms of f-measure, as well as in terms of pure precision, to observe if the automatic extraction methods could complement the manual token-based annotation. Results show that using automatic extraction methods to complement the manual annotation can improve coverage, and that individual models do not achieve high extraction quality, but, by combining two or more models, a recall of more than 90% could be achieved in the test data.
Anthology ID:
2026.propor-2.37
Volume:
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2
Month:
April
Year:
2026
Address:
Salvador, Brazil
Editors:
Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
Venue:
PROPOR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
282–292
Language:
URL:
https://aclanthology.org/2026.propor-2.37/
DOI:
Bibkey:
Cite (ACL):
Leonardo Zilio and Maria José Bocorny Finatto. 2026. Exploring automatic terminology extraction from historical medical data. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2, pages 282–292, Salvador, Brazil. Association for Computational Linguistics.
Cite (Informal):
Exploring automatic terminology extraction from historical medical data (Zilio & Finatto, PROPOR 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.propor-2.37.pdf