The LetsRead Corpus of Portuguese Children Reading Aloud for Performance Evaluation

Jorge Proença, Dirce Celorico, Sara Candeias, Carla Lopes, Fernando Perdigão


Abstract
This paper introduces the LetsRead Corpus of European Portuguese read speech from 6 to 10 years old children. The motivation for the creation of this corpus stems from the inexistence of databases with recordings of reading tasks of Portuguese children with different performance levels and including all the common reading aloud disfluencies. It is also essential to develop techniques to fulfill the main objective of the LetsRead project: to automatically evaluate the reading performance of children through the analysis of reading tasks. The collected data amounts to 20 hours of speech from 284 children from private and public Portuguese schools, with each child carrying out two tasks: reading sentences and reading a list of pseudowords, both with varying levels of difficulty throughout the school grades. In this paper, the design of the reading tasks presented to children is described, as well as the collection procedure. Manually annotated data is analyzed according to disfluencies and reading performance. The considered word difficulty parameter is also confirmed to be suitable for the pseudoword reading tasks.
Anthology ID:
L16-1125
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
781–785
Language:
URL:
https://aclanthology.org/L16-1125
DOI:
Bibkey:
Cite (ACL):
Jorge Proença, Dirce Celorico, Sara Candeias, Carla Lopes, and Fernando Perdigão. 2016. The LetsRead Corpus of Portuguese Children Reading Aloud for Performance Evaluation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 781–785, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
The LetsRead Corpus of Portuguese Children Reading Aloud for Performance Evaluation (Proença et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1125.pdf