Enhanced CORILGA: Introducing the Automatic Phonetic Alignment Tool for Continuous Speech

Roberto Seara, Marta Martinez, Rocío Varela, Carmen García Mateo, Elisa Fernandez Rei, Xosé Luis Regueira


Abstract
The “Corpus Oral Informatizado da Lingua Galega (CORILGA)” project aims at building a corpus of oral language for Galician, primarily designed to study the linguistic variation and change. This project is currently under development and it is periodically enriched with new contributions. The long-term goal is that all the speech recordings will be enriched with phonetic, syllabic, morphosyntactic, lexical and sentence ELAN-complaint annotations. A way to speed up the process of annotation is to use automatic speech-recognition-based tools tailored to the application. Therefore, CORILGA repository has been enhanced with an automatic alignment tool, available to the administrator of the repository, that aligns speech with an orthographic transcription. In the event that no transcription, or just a partial one, were available, a speech recognizer for Galician is used to generate word and phonetic segmentations. These recognized outputs may contain errors that will have to be manually corrected by the administrator. For assisting this task, the tool also provides an ELAN tier with the confidence measure of each recognized word. In this paper, after the description of the main facts of the CORILGA corpus, the speech alignment and recognition tools are described. Both have been developed using the Kaldi toolkit.
Anthology ID:
L16-1616
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3893–3898
Language:
URL:
https://aclanthology.org/L16-1616
DOI:
Bibkey:
Cite (ACL):
Roberto Seara, Marta Martinez, Rocío Varela, Carmen García Mateo, Elisa Fernandez Rei, and Xosé Luis Regueira. 2016. Enhanced CORILGA: Introducing the Automatic Phonetic Alignment Tool for Continuous Speech. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3893–3898, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Enhanced CORILGA: Introducing the Automatic Phonetic Alignment Tool for Continuous Speech (Seara et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1616.pdf