Semi-automatically Alignment of Predicates between Speech and OntoNotes data

Niraj Shrestha; Marie Francine Moens

Semi-automatically Alignment of Predicates between Speech and OntoNotes data

Abstract

Speech data currently receives a growing attention and is an important source of information. We still lack suitable corpora of transcribed speech annotated with semantic roles that can be used for semantic role labeling (SRL), which is not the case for written data. Semantic role labeling in speech data is a challenging and complex task due to the lack of sentence boundaries and the many transcription errors such as insertion, deletion and misspellings of words. In written data, SRL evaluation is performed at the sentence level, but in speech data sentence boundaries identification is still a bottleneck which makes evaluation more complex. In this work, we semi-automatically align the predicates found in transcribed speech obtained with an automatic speech recognizer (ASR) with the predicates found in the corresponding written documents of the OntoNotes corpus and manually align the semantic roles of these predicates thus obtaining annotated semantic frames in the speech data. This data can serve as gold standard alignments for future research in semantic role labeling of speech data.

Anthology ID:: L16-1222
Volume:: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:: May
Year:: 2016
Address:: Portorož, Slovenia
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 1397–1401
Language:
URL:: https://aclanthology.org/L16-1222/
DOI:
Bibkey:
Cite (ACL):: Niraj Shrestha and Marie-Francine Moens. 2016. Semi-automatically Alignment of Predicates between Speech and OntoNotes data. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1397–1401, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):: Semi-automatically Alignment of Predicates between Speech and OntoNotes data (Shrestha & Moens, LREC 2016)
Copy Citation:
PDF:: https://aclanthology.org/L16-1222.pdf

PDF Cite Search Fix data