Use of Transformer-Based Models for Word-Level Transliteration of the Book of the Dean of Lismore

Edward Gow-Smith, Mark McConville, William Gillies, Jade Scott, Roibeard Ó Maolalaigh


Abstract
The Book of the Dean of Lismore (BDL) is a 16th-century Scottish Gaelic manuscript written in a non-standard orthography. In this work, we outline the problem of transliterating the text of the BDL into a standardised orthography, and perform exploratory experiments using Transformer-based models for this task. In particular, we focus on the task of word-level transliteration, and achieve a character-level BLEU score of 54.15 with our best model, a BART architecture pre-trained on the text of Scottish Gaelic Wikipedia and then fine-tuned on around 2,000 word-level parallel examples. Our initial experiments give promising results, but we highlight the shortcomings of our model, and discuss directions for future work.
Anthology ID:
2022.cltw-1.13
Volume:
Proceedings of the 4th Celtic Language Technology Workshop within LREC2022
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Theodorus Fransen, William Lamb, Delyth Prys
Venue:
CLTW
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
94–98
Language:
URL:
https://aclanthology.org/2022.cltw-1.13
DOI:
Bibkey:
Cite (ACL):
Edward Gow-Smith, Mark McConville, William Gillies, Jade Scott, and Roibeard Ó Maolalaigh. 2022. Use of Transformer-Based Models for Word-Level Transliteration of the Book of the Dean of Lismore. In Proceedings of the 4th Celtic Language Technology Workshop within LREC2022, pages 94–98, Marseille, France. European Language Resources Association.
Cite (Informal):
Use of Transformer-Based Models for Word-Level Transliteration of the Book of the Dean of Lismore (Gow-Smith et al., CLTW 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.cltw-1.13.pdf