Automatic Rich Annotation of Large Corpus of Conversational transcribed speech: the Chunking Task of the EPAC Project

Jean-Yves Antoine; Abdenour Mokrane; Nathalie Friburger

Automatic Rich Annotation of Large Corpus of Conversational transcribed speech: the Chunking Task of the EPAC Project

Jean-Yves Antoine, Abdenour Mokrane, Nathalie Friburger

Abstract

This paper describes the use of the CasSys platform in order to achieve the chunking of conversational speech transcripts by means of cascades of Unitex transducers. Our system is involved in the EPAC project of the French National agency of Research (ANR). The aim of this project is to develop robust methods for the annotation of audio/multimedia document collections which contains conversational speech sequences such as TV or radio programs. At first, this paper presents the EPAC project and the adaptation of a former chunking system (Romus) which was developed in the restricted framework of dedicated spoken man-machine dialogue. Then, it describes the problems that are arising due to 1) spontaneous speech disfluencies and 2) errors for the previous stages of processing (automatic speech recognition and POS tagging).

Anthology ID:: L08-1521
Volume:: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:: May
Year:: 2008
Address:: Marrakech, Morocco
Editors:: Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2008/pdf/172_paper.pdf
DOI:
Bibkey:
Cite (ACL):: Jean-Yves Antoine, Abdenour Mokrane, and Nathalie Friburger. 2008. Automatic Rich Annotation of Large Corpus of Conversational transcribed speech: the Chunking Task of the EPAC Project. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):: Automatic Rich Annotation of Large Corpus of Conversational transcribed speech: the Chunking Task of the EPAC Project (Antoine et al., LREC 2008)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2008/pdf/172_paper.pdf

PDF Cite Search Fix data