Finetuning Latin BERT for Word Sense Disambiguation on the Thesaurus Linguae Latinae

Piroska Lendvai, Claudia Wick


Abstract
The Thesaurus Linguae Latinae (TLL) is a comprehensive monolingual dictionary that records contextualized meanings and usages of Latin words in antique sources at an unprecedented scale. We created a new dataset based on a subset of sense representations in the TLL, with which we finetuned the Latin-BERT neural language model (Bamman and Burns, 2020) on a supervised Word Sense Disambiguation task. We observe that the contextualized BERT representations finetuned on TLL data score better than static embeddings used in a bidirectional LSTM classifier on the same dataset, and that our per-lemma BERT models achieve higher and more robust performance than reported by Bamman and Burns (2020) based on data from a bilingual Latin dictionary. We demonstrate the differences in sense organizational principles between these two lexical resources, and report about our dataset construction and improved evaluation methodology.
Anthology ID:
2022.cogalex-1.5
Volume:
Proceedings of the Workshop on Cognitive Aspects of the Lexicon
Month:
November
Year:
2022
Address:
Taipei, Taiwan
Editors:
Michael Zock, Emmanuele Chersoni, Yu-Yin Hsu, Enrico Santus
Venue:
CogALex
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
37–41
Language:
URL:
https://aclanthology.org/2022.cogalex-1.5
DOI:
Bibkey:
Cite (ACL):
Piroska Lendvai and Claudia Wick. 2022. Finetuning Latin BERT for Word Sense Disambiguation on the Thesaurus Linguae Latinae. In Proceedings of the Workshop on Cognitive Aspects of the Lexicon, pages 37–41, Taipei, Taiwan. Association for Computational Linguistics.
Cite (Informal):
Finetuning Latin BERT for Word Sense Disambiguation on the Thesaurus Linguae Latinae (Lendvai & Wick, CogALex 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.cogalex-1.5.pdf