A Speech Recognizer for Frisian/Dutch Council Meetings

Martijn Bentum, Louis ten Bosch, Henk van den Heuvel, Simone Wills, Domenique van der Niet, Jelske Dijkstra, Hans Van de Velde


Abstract
We developed a bilingual Frisian/Dutch speech recognizer for council meetings in Fryslân (the Netherlands). During these meetings both Frisian and Dutch are spoken, and code switching between both languages shows up frequently. The new speech recognizer is based on an existing speech recognizer for Frisian and Dutch named FAME!, which was trained and tested on historical radio broadcasts. Adapting a speech recognizer for the council meeting domain is challenging because of acoustic background noise, speaker overlap and the jargon typically used in council meetings. To train the new recognizer, we used the radio broadcast materials utilized for the development of the FAME! recognizer and added newly created manually transcribed audio recordings of council meetings from eleven Frisian municipalities, the Frisian provincial council and the Frisian water board. The council meeting recordings consist of 49 hours of speech, with 26 hours of Frisian speech and 23 hours of Dutch speech. Furthermore, from the same sources, we obtained texts in the domain of council meetings containing 11 million words; 1.1 million Frisian words and 9.9 million Dutch words. We describe the methods used to train the new recognizer, report the observed word error rates, and perform an error analysis on remaining errors.
Anthology ID:
2022.lrec-1.107
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1009–1015
Language:
URL:
https://aclanthology.org/2022.lrec-1.107
DOI:
Bibkey:
Cite (ACL):
Martijn Bentum, Louis ten Bosch, Henk van den Heuvel, Simone Wills, Domenique van der Niet, Jelske Dijkstra, and Hans Van de Velde. 2022. A Speech Recognizer for Frisian/Dutch Council Meetings. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1009–1015, Marseille, France. European Language Resources Association.
Cite (Informal):
A Speech Recognizer for Frisian/Dutch Council Meetings (Bentum et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.107.pdf