Linguistic Resources for Speech Parsing

Ann Bies, Stephanie Strassel, Haejoong Lee, Kazuaki Maeda, Seth Kulick, Yang Liu, Mary Harper, Matthew Lease


Abstract
We report on the success of a two-pass approach to annotating metadata, speech effects and syntactic structure in English conversational speech: separately annotating transcribed speech for structural metadata, or structural events, (fillers, speech repairs ( or edit dysfluencies) and SUs, or syntactic/semantic units) and for syntactic structure (treebanking constituent structure and shallow argument structure). The two annotations were then combined into a single representation. Certain alignment issues between the two types of annotation led to the discovery and correction of annotation errors in each, resulting in a more accurate and useful resource. The development of this corpus was motivated by the need to have both metadata and syntactic structure annotated in order to support synergistic work on speech parsing and structural event detection. Automatic detection of these speech phenomena would simultaneously improve parsing accuracy and provide a mechanism for cleaning up transcriptions for downstream text processing. Similarly, constraints imposed by text processing systems such as parsers can be used to help improve identification of disfluencies and sentence boundaries. This paper reports on our efforts to develop a linguistic resource providing both spoken metadata and syntactic structure information, and describes the resulting corpus of English conversational speech.
Anthology ID:
L06-1471
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Editors:
Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/755_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
Ann Bies, Stephanie Strassel, Haejoong Lee, Kazuaki Maeda, Seth Kulick, Yang Liu, Mary Harper, and Matthew Lease. 2006. Linguistic Resources for Speech Parsing. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
Linguistic Resources for Speech Parsing (Bies et al., LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/755_pdf.pdf