2024
pdf
bib
abs
Ensembles of Hybrid and End-to-End Speech Recognition.
Aditya Kamlesh Parikh
|
Louis ten Bosch
|
Henk van den Heuvel
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
We propose a method to combine the hybrid Kaldi-based Automatic Speech Recognition (ASR) system with the end-to-end wav2vec 2.0 XLS-R ASR using confidence measures. Our research is focused on the low-resource Irish language. Given the limited available open-source resources, neither the standalone hybrid ASR nor the end-to-end ASR system can achieve optimal performance. By applying the Recognizer Output Voting Error Reduction (ROVER) technique, we illustrate how ensemble learning could facilitate mutual error correction between both ASR systems. This paper outlines the strategies for merging the hybrid Kaldi ASR model and the end-to-end XLS-R model with the help of confidence scores. Although contemporary state-of-the-art end-to-end ASR models face challenges related to prediction overconfidence, we utilize Renyi’s entropy-based confidence approach, tuned with temperature scaling, to align it with the Kaldi ASR confidence. Although there was no significant difference in the Word Error Rate (WER) between the hybrid and end-to-end ASR, we could achieve a notable reduction in WER after ensembling through ROVER. This resulted in an almost 14% Word Error Rate Reduction (WERR) on our primary test set and an approximately 20% WERR on other noisy and imbalanced test data.
2023
pdf
bib
abs
SignON: Sign Language Translation. Progress and challenges.
Vincent Vandeghinste
|
Dimitar Shterionov
|
Mirella De Sisto
|
Aoife Brady
|
Mathieu De Coster
|
Lorraine Leeson
|
Josep Blat
|
Frankie Picron
|
Marcello Paolo Scipioni
|
Aditya Parikh
|
Louis ten Bosch
|
John O’Flaherty
|
Joni Dambre
|
Jorn Rijckaert
|
Bram Vanroy
|
Victor Ubieto Nogales
|
Santiago Egea Gomez
|
Ineke Schuurman
|
Gorka Labaka
|
Adrián Núnez-Marcos
|
Irene Murtagh
|
Euan McGill
|
Horacio Saggion
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
SignON (
https://signon-project.eu/) is a Horizon 2020 project, running from 2021 until the end of 2023, which addresses the lack of technology and services for the automatic translation between sign languages (SLs) and spoken languages, through an inclusive, human-centric solution, hence contributing to the repertoire of communication media for deaf, hard of hearing (DHH) and hearing individuals. In this paper, we present an update of the status of the project, describing the approaches developed to address the challenges and peculiarities of SL machine translation (SLMT).
pdf
bib
Comparing Modular and End-To-End Approaches in ASR for Well-Resourced and Low-Resourced Languages
Aditya Parikh
|
Louis ten Bosch
|
Henk van den Heuvel
|
Cristian Tejedor-Garcia
Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023)
2022
pdf
bib
abs
A Speech Recognizer for Frisian/Dutch Council Meetings
Martijn Bentum
|
Louis ten Bosch
|
Henk van den Heuvel
|
Simone Wills
|
Domenique van der Niet
|
Jelske Dijkstra
|
Hans Van de Velde
Proceedings of the Thirteenth Language Resources and Evaluation Conference
We developed a bilingual Frisian/Dutch speech recognizer for council meetings in Fryslân (the Netherlands). During these meetings both Frisian and Dutch are spoken, and code switching between both languages shows up frequently. The new speech recognizer is based on an existing speech recognizer for Frisian and Dutch named FAME!, which was trained and tested on historical radio broadcasts. Adapting a speech recognizer for the council meeting domain is challenging because of acoustic background noise, speaker overlap and the jargon typically used in council meetings. To train the new recognizer, we used the radio broadcast materials utilized for the development of the FAME! recognizer and added newly created manually transcribed audio recordings of council meetings from eleven Frisian municipalities, the Frisian provincial council and the Frisian water board. The council meeting recordings consist of 49 hours of speech, with 26 hours of Frisian speech and 23 hours of Dutch speech. Furthermore, from the same sources, we obtained texts in the domain of council meetings containing 11 million words; 1.1 million Frisian words and 9.9 million Dutch words. We describe the methods used to train the new recognizer, report the observed word error rates, and perform an error analysis on remaining errors.
2010
pdf
bib
abs
A Speech Corpus for Modeling Language Acquisition: CAREGIVER
Toomas Altosaar
|
Louis ten Bosch
|
Guillaume Aimetti
|
Christos Koniaris
|
Kris Demuynck
|
Henk van den Heuvel
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
A multi-lingual speech corpus used for modeling language acquisition called CAREGIVER has been designed and recorded within the framework of the EU funded Acquisition of Communication and Recognition Skills (ACORNS) project. The paper describes the motivation behind the corpus and its design by relying on current knowledge regarding infant language acquisition. Instead of recording infants and children, the voices of their primary and secondary caregivers were captured in both infant-directed and adult-directed speech modes over four languages in a read speech manner. The challenges and methods applied to obtain similar prompts in terms of complexity and semantics across different languages, as well as the normalized recording procedures employed at different locations, is covered. The corpus contains nearly 66000 utterance based audio files spoken over a two-year period by 17 male and 17 female native speakers of Dutch, English, Finnish, and Swedish. An orthographical transcription is available for every utterance. Also, time-aligned word and phone annotations for many of the sub-corpora also exist. The CAREGIVER corpus will be published via ELRA.
2008
pdf
bib
abs
Evaluating the Relationship between Linguistic and Geographic Distances using a 3D Visualization
Folkert de Vriend
|
Jan Pieter Kunst
|
Louis ten Bosch
|
Charlotte Giesbers
|
Roeland van Hout
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
In this paper we discuss how linguistic and geographic distances can be related using a 3D visualization. We will convert linguistic data for locations along the German-Dutch border to linguistic distances that can be compared directly to geographic distances. This enables us to visualize linguistic distances as real distances with the use of the third dimension available in 3D modelling software. With such a visualization we will test if descriptive dialect data support the hypothesis that the German-Dutch state border became a linguistic border between the German and Dutch dialects. Our visualization is implemented in the 3D modelling software SketchUp.