Word-Embedding based Content Features for Automated Oral Proficiency Scoring

Su-Youn Yoon; Anastassia Loukina; Chungmin Lee; Matthew Mulholland; Xinhao Wang; Ikkyu Choi

Word-Embedding based Content Features for Automated Oral Proficiency Scoring

Su-Youn Yoon, Anastassia Loukina, Chong Min Lee, Matthew Mulholland, Xinhao Wang, Ikkyu Choi

Abstract

In this study, we develop content features for an automated scoring system of non-native English speakers’ spontaneous speech. The features calculate the lexical similarity between the question text and the ASR word hypothesis of the spoken response, based on traditional word vector models or word embeddings. The proposed features do not require any sample training responses for each question, and this is a strong advantage since collecting question-specific data is an expensive task, and sometimes even impossible due to concerns about question exposure. We explore the impact of these new features on the automated scoring of two different question types: (a) providing opinions on familiar topics and (b) answering a question about a stimulus material. The proposed features showed statistically significant correlations with the oral proficiency scores, and the combination of new features with the speech-driven features achieved a small but significant further improvement for the latter question type. Further analyses suggested that the new features were effective in assigning more accurate scores for responses with serious content issues.

Anthology ID:: W18-4002
Volume:: Proceedings of the Third Workshop on Semantic Deep Learning
Month:: August
Year:: 2018
Address:: Santa Fe, New Mexico
Editors:: Luis Espinosa Anke, Dagmar Gromann, Thierry Declerck
Venue:: SemDeep
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12–22
Language:
URL:: https://aclanthology.org/W18-4002/
DOI:
Bibkey:
Cite (ACL):: Su-Youn Yoon, Anastassia Loukina, Chong Min Lee, Matthew Mulholland, Xinhao Wang, and Ikkyu Choi. 2018. Word-Embedding based Content Features for Automated Oral Proficiency Scoring. In Proceedings of the Third Workshop on Semantic Deep Learning, pages 12–22, Santa Fe, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Word-Embedding based Content Features for Automated Oral Proficiency Scoring (Yoon et al., SemDeep 2018)
Copy Citation:
PDF:: https://aclanthology.org/W18-4002.pdf

PDF Cite Search Fix data