TLT-school: a Corpus of Non Native Children Speech

Roberto Gretter, Marco Matassoni, Stefano Bannò, Falavigna Daniele


Abstract
This paper describes “TLT-school” a corpus of speech utterances collected in schools of northern Italy for assessing the performance of students learning both English and German. The corpus was recorded in the years 2017 and 2018 from students aged between nine and sixteen years, attending primary, middle and high school. All utterances have been scored, in terms of some predefined proficiency indicators, by human experts. In addition, most of utterances recorded in 2017 have been manually transcribed carefully. Guidelines and procedures used for manual transcriptions of utterances will be described in detail, as well as results achieved by means of an automatic speech recognition system developed by us. Part of the corpus is going to be freely distributed to scientific community particularly interested both in non-native speech recognition and automatic assessment of second language proficiency.
Anthology ID:
2020.lrec-1.47
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
378–385
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.47
DOI:
Bibkey:
Cite (ACL):
Roberto Gretter, Marco Matassoni, Stefano Bannò, and Falavigna Daniele. 2020. TLT-school: a Corpus of Non Native Children Speech. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 378–385, Marseille, France. European Language Resources Association.
Cite (Informal):
TLT-school: a Corpus of Non Native Children Speech (Gretter et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.47.pdf