Low-cost Customized Speech Corpus Creation for Speech Technology Applications

Kazuaki Maeda; Christopher Cieri; Kevin Walker

Low-cost Customized Speech Corpus Creation for Speech Technology Applications

Kazuaki Maeda, Christopher Cieri, Kevin Walker

Abstract

Speech technology applications, such as speech recognition, speech synthesis, and speech dialog systems, often require corpora based on highly customized specifications. Existing corpora available to the community, such as TIMIT and other corpora distributed by LDC and ELDA, do not always meet the requirements of such applications. In such cases, the developers need to create their own corpora. The creation of a highly customized speech corpus, however, could be a very expensive and time-consuming task, especially for small organizations. It requires multidisciplinary expertise in linguistics, management and engineering as it involves subtasks such as the corpus design, human subject recruitment, recording, quality assurance, and in some cases, segmentation, transcription and annotation. This paper describes LDC's recent involvement in the creation of a low-cost yet highly-customized speech corpus for a commercial organization under a novel data creation and licensing model, which benefits both the particular data requester and the general linguistic data user community.

Anthology ID:: L06-1484
Volume:: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:: May
Year:: 2006
Address:: Genoa, Italy
Editors:: Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2006/pdf/776_pdf.pdf
DOI:
Bibkey:
Cite (ACL):: Kazuaki Maeda, Christopher Cieri, and Kevin Walker. 2006. Low-cost Customized Speech Corpus Creation for Speech Technology Applications. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):: Low-cost Customized Speech Corpus Creation for Speech Technology Applications (Maeda et al., LREC 2006)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2006/pdf/776_pdf.pdf

PDF Cite Search Fix data