Adapting transformer models to morphological tagging of two highly inflectional languages: a case study on Ancient Greek and Latin

Alek Keersmaekers, Wouter Mercelis


Abstract
Natural language processing for Greek and Latin, inflectional languages with small corpora, requires special techniques. For morphological tagging, transformer models show promising potential, but the best approach to use these models is unclear. For both languages, this paper examines the impact of using morphological lexica, training different model types (a single model with a combined feature tag, multiple models for separate features, and a multi-task model for all features), and adding linguistic constraints. We find that, although simply fine-tuning transformers to predict a monolithic tag may already yield decent results, each of these adaptations can further improve tagging accuracy.
Anthology ID:
2024.ml4al-1.17
Volume:
Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)
Month:
August
Year:
2024
Address:
Hybrid in Bangkok, Thailand and online
Editors:
John Pavlopoulos, Thea Sommerschield, Yannis Assael, Shai Gordin, Kyunghyun Cho, Marco Passarotti, Rachele Sprugnoli, Yudong Liu, Bin Li, Adam Anderson
Venues:
ML4AL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
165–176
Language:
URL:
https://aclanthology.org/2024.ml4al-1.17
DOI:
Bibkey:
Cite (ACL):
Alek Keersmaekers and Wouter Mercelis. 2024. Adapting transformer models to morphological tagging of two highly inflectional languages: a case study on Ancient Greek and Latin. In Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024), pages 165–176, Hybrid in Bangkok, Thailand and online. Association for Computational Linguistics.
Cite (Informal):
Adapting transformer models to morphological tagging of two highly inflectional languages: a case study on Ancient Greek and Latin (Keersmaekers & Mercelis, ML4AL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ml4al-1.17.pdf