Adrian Doyle


2024

pdf bib
Findings of the SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages
Oksana Dereza | Adrian Doyle | Priya Rani | Atul Kr. Ojha | Pádraic Moran | John McCrae
Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

This paper discusses the organisation and findings of the SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages. The shared task was split into the constrained and unconstrained tracks and involved solving either 3 or 5 problems for either 13 or 16 ancient and historical languages belonging to 4 language families, and making use of 6 different scripts. There were 14 registrations in total, of which 3 teams submitted to each track. Out of these 6 submissions, 2 systems were successful in the constrained setting and another 2 in the uncon- strained setting, and 4 system description papers were submitted by different teams. The best average result for morphological feature prediction was about 96%, while the best average results for POS-tagging and lemmatisation were 96% and 94% respectively. At the word level, the winning team could not achieve a higher average accuracy across all 16 languages than 5.95%, which demonstrates the difficulty of this problem. At the character level, the best average result over 16 languages 55.62%

2023

pdf bib
The Cardamom Workbench for Historical and Under-Resourced Languages
Adrian Doyle | Theodorus Fransen | Bernardo Stearns | John P. McCrae | Oksana Dereza | Priya Rani
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
Findings of the SIGTYP 2023 Shared task on Cognate and Derivative Detection For Low-Resourced Languages
Priya Rani | Koustava Goswami | Adrian Doyle | Theodorus Fransen | Bernardo Stearns | John P. McCrae
Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

This paper describes the structure and findings of the SIGTYP 2023 shared task on cognate and derivative detection for low-resourced languages, broken down into a supervised and unsupervised sub-task. The participants were asked to submit the test data’s final prediction. A total of nine teams registered for the shared task where seven teams registered for both sub-tasks. Only two participants ended up submitting system descriptions, with only one submitting systems for both sub-tasks. While all systems show a rather promising performance, all could be within the baseline score for the supervised sub-task. However, the system submitted for the unsupervised sub-task outperforms the baseline score.

2019

pdf bib
Adapting Term Recognition to an Under-Resourced Language: the Case of Irish
John P. McCrae | Adrian Doyle
Proceedings of the Celtic Language Technology Workshop

pdf bib
A Character-Level LSTM Network Model for Tokenizing the Old Irish text of the Würzburg Glosses on the Pauline Epistles
Adrian Doyle | John P. McCrae | Clodagh Downey
Proceedings of the Celtic Language Technology Workshop