Harald Berthelsen


2022

pdf bib
Reading Assistance through LARA, the Learning And Reading Assistant
Elham Akhlaghi | Ingibjörg Iða Auðunardóttir | Branislav Bédi | Hakeem Beedar | Harald Berthelsen | Cathy Chua | Catia Cucchiarini | Brynjarr Eyjólfsson | Nedelina Ivanova | Christèle Maizonniaux | Neasa Ní Chiaráin | Manny Rayner | John Sloan | Sigurður Vigfússon | Ghil’ad Zuckermann
Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference

We present an overview of LARA, the Learning And Reading Assistant, an open source platform for easy creation and use of multimedia annotated texts designed to support the improvement of reading skills. The paper is divided into three parts. In the first, we give a brief summary of LARA’s processing. In the second, we describe some generic functionality specially relevant for reading assistance: support for phonetically annotated texts, support for image-based texts, and integrated production of text-to-speech (TTS) generated audio. In the third, we outline some of the larger projects so far carried out with LARA, involving development of content for learning second and foreign (L2) languages such as Icelandic, Farsi, Irish, Old Norse and the Australian Aboriginal language Barngarla, where the issues involved overlap with those that arise when trying to help students improve first-language (L1) reading skills. All software and almost all content is freely available.

pdf bib
Using Speech and NLP Resources to build an iCALL platform for a minority language, the story of An Scéalaí, the Irish experience to date
Neasa Ní Chiaráin | Oisín Nolan | Madeleine Comtois | Neimhin Robinson Gunning | Harald Berthelsen | Ailbhe Ni Chasaide
Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages

This paper describes how emerging linguistic resources and technologies can be used to build a language learning platform for Irish, an endangered language. This platform, An Scéalaí, harvests learner corpora - a vital resource both to study the stages of learners’ language acquisition and to guide future platform development. A technical description of the platform is provided, including details of how different speech technologies and linguistic resources are fused to provide a holistic learner experience. The active continuous participation of the community, and platform evaluations by learners and teachers, are discussed.

pdf bib
Automatic Speech Recognition for Irish: the ABAIR-ÉIST System
Liam Lonergan | Mengjie Qian | Harald Berthelsen | Andy Murphy | Christoph Wendler | Neasa Ní Chiaráin | Christer Gobl | Ailbhe Ní Chasaide
Proceedings of the 4th Celtic Language Technology Workshop within LREC2022

This paper describes ÉIST, automatic speech recogniser for Irish, developed as part of the ongoing ABAIR initiative, combining (1) acoustic models, (2) pronunciation lexicons and (3) language models into a hybrid system. A priority for now is a system that can deal with the multiple diverse native-speaker dialects. Consequently, (1) was built using predominately native-speaker speech, which included earlier recordings used for synthesis development as well as more diverse recordings obtained using the MíleGlór platform. The pronunciation variation across the dialects is a particular challenge in the development of (2) and is explored by testing both Trans-dialect and Multi-dialect letter-to-sound rules. Two approaches to language modelling (3) are used in the hybrid system, a simple n-gram model and recurrent neural network lattice rescoring, the latter garnering impressive performance improvements. The system is evaluated using a test set that is comprised of both native and non-native speakers, which allows for some inferences to be made on the performance of the system on both cohorts.

pdf bib
Celtic CALL: strengthening the vital role of education for language transmission
Neasa Ní Chiaráin | Madeleine Comtois | Oisín Nolan | Neimhin Robinson-Gunning | John Sloan | Harald Berthelsen | Ailbhe Ní Chasaide
Proceedings of the 4th Celtic Language Technology Workshop within LREC2022

In this paper, we present the Irish language learning platform, An Sc ́eala ́ı, an intelligent Computer-Assisted Language Learning (iCALL) system which incorporates speech and language technologies in ways that promote the holistic development of the language skills - writing, listening, reading, and speaking. The technologies offer the advantage of extensive feedback in spoken and written form, enabling learners to improve their production. The system works equally as a classroom-based tool and as a standalone platform for the autonomous learner. Given the key role of education for the transmission of all the Celtic languages, it is vital that digital technologies be harnessed to maximise the effectiveness of language teaching/learning. An Scéalaí has been used by large numbers of learners and teachers and has received very positive feedback. It is built as a modular system which allows existing and newly emerging technologies to be readily integrated, even if those technologies are still in development phase. The architecture is largely language-independent, and as an open-source system, it is hoped that it can be usefully deployed in other Celtic languages.

pdf bib
AAC don Ghaeilge: the Prototype Development of Speech-Generating Assistive Technology for Irish
Emily Barnes | Oisín Morrin | Ailbhe Ní Chasaide | Julia Cummins | Harald Berthelsen | Andy Murphy | Muireann Nic Corcráin | Claire O’Neill | Christer Gobl | Neasa Ní Chiaráin
Proceedings of the 4th Celtic Language Technology Workshop within LREC2022

This paper describes the prototype development of an Alternative and Augmentative Communication (AAC) system for the Irish language. This system allows users to communicate using the ABAIR synthetic voices, by selecting a series of words or images. Similar systems are widely available in English and are often used by autistic people, as well as by people with Cerebral Palsy, Alzheimer’s and Parkinson’s disease. A dual-pronged approach to development has been adopted: this involves (i) the initial short-term prototype development that targets the immediate needs of specific users, as well as considerations for (ii) the longer term development of a bilingual AAC system which will suit a broader range of users with varying linguistic backgrounds, age ranges and needs. This paper described the design considerations and the implementation steps in the current system. Given the substantial differences in linguistic structures in Irish and English, the development of a bilingual system raises many research questions and avenues for future development.

pdf bib
Using the LARA Little Prince to compare human and TTS audio quality
Elham Akhlaghi | Ingibjörg Iða Auðunardóttir | Anna Bączkowska | Branislav Bédi | Hakeem Beedar | Harald Berthelsen | Cathy Chua | Catia Cucchiarin | Hanieh Habibi | Ivana Horváthová | Junta Ikeda | Christèle Maizonniaux | Neasa Ní Chiaráin | Chadi Raheb | Manny Rayner | John Sloan | Nikos Tsourakis | Chunlin Yao
Proceedings of the Thirteenth Language Resources and Evaluation Conference

A popular idea in Computer Assisted Language Learning (CALL) is to use multimodal annotated texts, with annotations typically including embedded audio and translations, to support L2 learning through reading. An important question is how to create good quality audio, which can be done either through human recording or by a Text-To-Speech (TTS) engine. We may reasonably expect TTS to be quicker and easier, but human to be of higher quality. Here, we report a study using the open source LARA platform and ten languages. Samples of audio totalling about five minutes, representing the same four passages taken from LARA versions of Saint-Exupèry’s “Le petit prince”, were provided for each language in both human and TTS form; the passages were chosen to instantiate the 2x2 cross product of the conditions dialogue, not-dialogue and humour, not-humour. 251 subjects used a web form to compare human and TTS versions of each item and rate the voices as a whole. For the three languages where TTS did best, English, French and Irish, the evidence from this study and the previous one it extended suggest that TTS audio is now pedagogically adequate and roughly comparable with a non-professional human voice in terms of exemplifying correct pronunciation and prosody. It was however still judged substantially less natural and less pleasant to listen to. No clear evidence was found to support the hypothesis that dialogue and humour pose special problems for TTS. All data and software will be made freely available.

2020

pdf bib
Constructing Multimodal Language Learner Texts Using LARA: Experiences with Nine Languages
Elham Akhlaghi | Branislav Bédi | Fatih Bektaş | Harald Berthelsen | Matthias Butterweck | Cathy Chua | Catia Cucchiarin | Gülşen Eryiğit | Johanna Gerlach | Hanieh Habibi | Neasa Ní Chiaráin | Manny Rayner | Steinþór Steingrímsson | Helmer Strik
Proceedings of the Twelfth Language Resources and Evaluation Conference

LARA (Learning and Reading Assistant) is an open source platform whose purpose is to support easy conversion of plain texts into multimodal online versions suitable for use by language learners. This involves semi-automatically tagging the text, adding other annotations and recording audio. The platform is suitable for creating texts in multiple languages via crowdsourcing techniques that can be used for teaching a language via reading and listening. We present results of initial experiments by various collaborators where we measure the time required to produce substantial LARA resources, up to the length of short novels, in Dutch, English, Farsi, French, German, Icelandic, Irish, Swedish and Turkish. The first results are encouraging. Although there are some startup problems, the conversion task seems manageable for the languages tested so far. The resulting enriched texts are posted online and are freely available in both source and compiled form.