Evaluation of Automatically Generated Transcriptions of Non-Native Pronunciations using a Phonetic Distance Measure

Stefan Schaden


Abstract
The paper reports on the evaluation of a rule-based technique to model prototypical non-native pronunciation variants on the symbolic transcription level. This technique was developed to explore the possibility of an automatic generation of adapted pronunciation lexicons for different non-native speaker groups. The rule sets, which are currently available for nine language directions, are based on non-native speech data compiled specifically for this purpose. Since manual phonetic annotations are available for the speech data, the evaluation was performed on the transcription level by measuring the phonetic distance of the automatically generated pronunciations variants and actual pronunciations of non-native speakers. One of the central questions to be addressed by the evaluation is whether the rules have any predictive value: It has to be determined if and to what degree the rules are capable of generating realistic pronunciation variants for previously unseen speakers. Secondly, the rules should not only represent the pronunciations of individual speakers adequately; instead, they should be representative of speaker groups (cross-speaker representation). The paper outlines the evaluation methodology and presents results for selected language directions.
Anthology ID:
L06-1430
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Editors:
Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/691_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
Stefan Schaden. 2006. Evaluation of Automatically Generated Transcriptions of Non-Native Pronunciations using a Phonetic Distance Measure. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
Evaluation of Automatically Generated Transcriptions of Non-Native Pronunciations using a Phonetic Distance Measure (Schaden, LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/691_pdf.pdf