Delyth Prys


2022

pdf bib
Proceedings of the 4th Celtic Language Technology Workshop within LREC2022
Theodorus Fransen | William Lamb | Delyth Prys
Proceedings of the 4th Celtic Language Technology Workshop within LREC2022

pdf bib
BU-TTS: An Open-Source, Bilingual Welsh-English, Text-to-Speech Corpus
Stephen Russell | Dewi Jones | Delyth Prys
Proceedings of the 4th Celtic Language Technology Workshop within LREC2022

This paper presents the design, collection and verification of a bilingual text-to-speech synthesis corpus for Welsh and English. The ever expanding voice collection currently contains almost 10 hours of recordings from a bilingual, phonetically balanced text corpus. The speakers consist of a professional voice actor and three amateur contributors, with male and female accents from north and south Wales. This corpus provides audio-text pairs for building and training high-quality bilingual Welsh-English neural based TTS systems. We describe the process by which we created a phonetically balanced prompt set and the challenges of attempting to collate such a dataset during the COVID-19 pandemic. Our initial findings in validating the corpus via the implementation of a state-of-the-art TTS models are presented. This corpus represents the first open-source Welsh language corpus large enough to capitalise on neural TTS architectures.

2020

pdf bib
Adapting a Welsh Terminology Tool to Develop a Cornish Dictionary
Delyth Prys
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

Cornish and Welsh are closely related Celtic languages and this paper provides a brief description of a recent project to publish an online bilingual English/Cornish dictionary, the Gerlyver Kernewek, based on similar work previously undertaken for Welsh. Both languages are endangered, Cornish critically so, but both can benefit from the use of language technology. Welsh has previous experience of using language technologies for language revitalization, and this is now being used to help the Cornish language create new tools and resources, including lexicographical ones, helping a dispersed team of language specialists and editors, many of them in a voluntary capacity, to work collaboratively online. Details are given of the Maes T dictionary writing and publication platform, originally developed for Welsh, and of some of the adaptations that had to be made to accommodate the specific needs of Cornish, including their use of Middle and Late varieties due to its development as a revived language.

2019

pdf bib
Proceedings of the Celtic Language Technology Workshop
Teresa Lynn | Delyth Prys | Colin Batchelor | Francis Tyers
Proceedings of the Celtic Language Technology Workshop

2016

pdf bib
Cysill Ar-lein: A Corpus of Written Contemporary Welsh Compiled from an On-line Spelling and Grammar Checker
Delyth Prys | Gruffudd Prys | Dewi Bryn Jones
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes the use of a free, on-line language spelling and grammar checking aid as a vehicle for the collection of a significant (31 million words and rising) corpus of text for academic research in the context of less resourced languages where such data in sufficient quantities are often unavailable. It describes two versions of the corpus: the texts as submitted, prior to the correction process, and the texts following the user’s incorporation of any suggested changes. An overview of the corpus’ contents is given and an analysis of use including usage statistics is also provided. Issues surrounding privacy and the anonymization of data are explored as is the data’s potential use for linguistic analysis, lexical research and language modelling. The method used for gathering this corpus is believed to be unique, and is a valuable addition to corpus studies in a minority language.

2014

pdf bib
Developing further speech recognition resources for Welsh
Sarah Cooper | Dewi Jones | Delyth Prys
Proceedings of the First Celtic Language Technology Workshop

pdf bib
DECHE and the Welsh National Corpus Portal
Delyth Prys | Dewi Jones | Mared Roberts
Proceedings of the First Celtic Language Technology Workshop