Part-of-Speech Tagging of 16th-Century Latin with GPT

Elina Stüssi, Phillip Ströbel


Abstract
Part-of-speech tagging is foundational to natural language processing, transcending mere linguistic functions. However, taggers optimized for Classical Latin struggle when faced with diverse linguistic eras shaped by the language ́s evolution. Exploring 16th-century Latin from the correspondence and assessing five Latin treebanks, we focused on carefully evaluating tagger accuracy and refining Large Language Models for improved performance in this nuanced linguistic context. Our discoveries unveiled the competitive accuracies of different versions of GPT, particularly after fine-tuning. Notably, our best fine-tuned model soared to an average accuracy of 88.99% over the treebank data, underscoring the remarkable adaptability and learning capabilities when fine-tuned to the specific intricacies of Latin texts. Next to emphasising GPT’s part-of-speech tagging capabilities, our second aim is to strengthen taggers ́ adaptability across different periods. We establish solid groundwork for using Large Language Models in specific natural language processing tasks where part-of-speech tagging is often employed as a pre-processing step. This work significantly advances the use of modern language models in interpreting historical language, bridging the gap between past linguistic epochs and modern computational linguistics.
Anthology ID:
2024.latechclfl-1.18
Volume:
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Stan Szpakowicz
Venues:
LaTeCHCLfL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
196–206
Language:
URL:
https://aclanthology.org/2024.latechclfl-1.18
DOI:
Bibkey:
Cite (ACL):
Elina Stüssi and Phillip Ströbel. 2024. Part-of-Speech Tagging of 16th-Century Latin with GPT. In Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), pages 196–206, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
Part-of-Speech Tagging of 16th-Century Latin with GPT (Stüssi & Ströbel, LaTeCHCLfL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.latechclfl-1.18.pdf
Supplementary material:
 2024.latechclfl-1.18.SupplementaryMaterial.zip