Linguistic Resources for Reconstructing Spontaneous Speech Text

Erin Fitzgerald; Frederick Jelinek

Linguistic Resources for Reconstructing Spontaneous Speech Text

Abstract

The output of a speech recognition system is not always ideal for subsequent downstream processing, in part because speakers themselves often make mistakes. A system would accomplish speech reconstruction of its spontaneous speech input if its output were to represent, in flawless, fluent, and content-preserving English, the message that the speaker intended to convey. These cleaner speech transcripts would allow for more accurate language processing as needed for NLP tasks such as machine translation and conversation summarization, which often rely on grammatical input. Recognizing that supervised statistical methods to identify and transform ill-formed areas of the transcript will require richly labeled resources, we have built the Spontaneous Speech Reconstruction corpus. This small corpus of reconstructed and aligned conversational telephone speech transcriptions for the Fisher conversational telephone speech corpus (Strassel and Walker, 2004) was annotated on several levels including string transformations and predicate-argument structure, and will be shared with the linguistic research community.

Anthology ID:: L08-1530
Volume:: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:: May
Year:: 2008
Address:: Marrakech, Morocco
Editors:: Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2008/pdf/874_paper.pdf
DOI:
Bibkey:
Cite (ACL):: Erin Fitzgerald and Frederick Jelinek. 2008. Linguistic Resources for Reconstructing Spontaneous Speech Text. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):: Linguistic Resources for Reconstructing Spontaneous Speech Text (Fitzgerald & Jelinek, LREC 2008)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2008/pdf/874_paper.pdf

PDF Cite Search Fix data