CORP-ORAL: Spontaneous Speech Corpus for European Portuguese

Fabíola Santos, Tiago Freitas


Abstract
Research activity on the Portuguese language for speech synthesis and recognition has suffered from a considerable lack of human and material resources. This has raised some obstacles to the development of speech technology and speech interface platforms. One of the most significant obstacles is the lack of spontaneous speech corpora for the creation, training and further improvement of speech synthesis and recognition programs. It was in order to suppress this gap that the CORP-ORAL project was planned. The aim of the project is to build a corpus of spontaneous EP available for the training of speech synthesis and recognition systems as well as phonetic, phonological, lexical, morphological and syntactic studies. Further possibilities of enquiry such as sociolinguistic and pragmatic research are also covered in the corpus design. The data consist of unscripted and unprompted face-to-face dialogues between family, friends, colleagues and unacquainted participants. All recordings are orthographically transcribed and prosodically annotated. CORP-ORAL is built from scratch with the explicit goal of becoming entirely available on the internet to the scientific community and the public in general.
Anthology ID:
L08-1513
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/331_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Fabíola Santos and Tiago Freitas. 2008. CORP-ORAL: Spontaneous Speech Corpus for European Portuguese. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
CORP-ORAL: Spontaneous Speech Corpus for European Portuguese (Santos & Freitas, LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/331_paper.pdf