Basque Speecon-like and Basque SpeechDat MDB-600: speech databases for the development of ASR technology for Basque

Igor Odriozola, Inma Hernaez, María Inés Torres, Luis Javier Rodriguez-Fuentes, Mikel Penagarikano, Eva Navas


Abstract
This paper introduces two databases specifically designed for the development of ASR technology for the Basque language: the Basque Speecon-like database and the Basque SpeechDat MDB-600 database. The former was recorded in an office environment according to the Speecon specifications, whereas the later was recorded through mobile telephones according to the SpeechDat specifications. Both databases were created under an initiative that the Basque Government started in 2005, a program called ADITU, which aimed at developing speech technologies for Basque. The databases belong to the Basque Government. A comprehensive description of both databases is provided in this work, highlighting the differences with regard to their corresponding standard specifications. The paper also presents several initial experimental results for both databases with the purpose of validating their usefulness for the development of speech recognition technology. Several applications already developed with the Basque Speecon-like database are also described. Authors aim to make these databases widely known to the community as well, and foster their use by other groups.
Anthology ID:
L14-1583
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2658–2665
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/744_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Igor Odriozola, Inma Hernaez, María Inés Torres, Luis Javier Rodriguez-Fuentes, Mikel Penagarikano, and Eva Navas. 2014. Basque Speecon-like and Basque SpeechDat MDB-600: speech databases for the development of ASR technology for Basque. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2658–2665, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Basque Speecon-like and Basque SpeechDat MDB-600: speech databases for the development of ASR technology for Basque (Odriozola et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/744_Paper.pdf