2022
pdf
bib
abs
Handwritten Text Recognition (HTR) for Irish-Language Folklore
Brian Ó Raghallaigh
|
Andrea Palandri
|
Críostóir Mac Cárthaigh
Proceedings of the 4th Celtic Language Technology Workshop within LREC2022
In this paper we present our method for digitising a large collection of handwritten Irish-language texts as part of a project to mine information from a large corpus of Irish and Scottish Gaelic folktales. The handwritten texts form part of the Main Manuscript Collection of the National Folklore Collection of Ireland and contain handwritten transcriptions of oral folklore collected in Ireland in the 20th century. With the goal of creating a large text corpus of the Irish-language folktales contained within this collection, our method involves scanning the pages of the physical volumes and digitising the text on these pages using Transkribus, a platform for the recognition of historical documents. Given the nature of the collection, the approach we have taken involves the creation of individual text recognition models for multiple collectors’ hands. Doing it this way was motivated by the fact that a relatively small number of collectors contributed the bulk of the material, while the differences between each collector in terms of style, layout and orthography were difficult to reconcile within a single handwriting model. We present our preliminary results along with a discussion on the viability of using crowdsourced correction to improve our HTR models.
2019
pdf
bib
Improving full-text search results on dúchas.ie using language technology
Brian Ó Raghallaigh
|
Kevin Scannell
|
Meghan Dowling
Proceedings of the Celtic Language Technology Workshop
2014
pdf
bib
Proceedings of the First Celtic Language Technology Workshop
John Judge
|
Teresa Lynn
|
Monica Ward
|
Brian Ó Raghallaigh
Proceedings of the First Celtic Language Technology Workshop
pdf
bib
Developing high-end reusable tools and resources for Irish-language terminology, lexicography, onomastics (toponymy), folkloristics, and more, using modern web and database technologies
Brian Ó Raghallaigh
|
Michal Boleslav Měchura
Proceedings of the First Celtic Language Technology Workshop