Findings of the SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages

Oksana Dereza, Adrian Doyle, Priya Rani, Atul Kr. Ojha, Pádraic Moran, John McCrae


Abstract
This paper discusses the organisation and findings of the SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages. The shared task was split into the constrained and unconstrained tracks and involved solving either 3 or 5 problems for either 13 or 16 ancient and historical languages belonging to 4 language families, and making use of 6 different scripts. There were 14 registrations in total, of which 3 teams submitted to each track. Out of these 6 submissions, 2 systems were successful in the constrained setting and another 2 in the uncon- strained setting, and 4 system description papers were submitted by different teams. The best average result for morphological feature prediction was about 96%, while the best average results for POS-tagging and lemmatisation were 96% and 94% respectively. At the word level, the winning team could not achieve a higher average accuracy across all 16 languages than 5.95%, which demonstrates the difficulty of this problem. At the character level, the best average result over 16 languages 55.62%
Anthology ID:
2024.sigtyp-1.19
Volume:
Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:
March
Year:
2024
Address:
St. Julian's, Malta
Editors:
Michael Hahn, Alexey Sorokin, Ritesh Kumar, Andreas Shcherbakov, Yulia Otmakhova, Jinrui Yang, Oleg Serikov, Priya Rani, Edoardo M. Ponti, Saliha Muradoğlu, Rena Gao, Ryan Cotterell, Ekaterina Vylomova
Venues:
SIGTYP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
160–172
Language:
URL:
https://aclanthology.org/2024.sigtyp-1.19
DOI:
Bibkey:
Cite (ACL):
Oksana Dereza, Adrian Doyle, Priya Rani, Atul Kr. Ojha, Pádraic Moran, and John McCrae. 2024. Findings of the SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages. In Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 160–172, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):
Findings of the SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages (Dereza et al., SIGTYP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.sigtyp-1.19.pdf
Video:
 https://aclanthology.org/2024.sigtyp-1.19.mp4