Klaus Zechner


2019

pdf bib
Using Rhetorical Structure Theory to Assess Discourse Coherence for Non-native Spontaneous Speech
Xinhao Wang | Binod Gyawali | James V. Bruno | Hillary R. Molloy | Keelan Evanini | Klaus Zechner
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

This study aims to model the discourse structure of spontaneous spoken responses within the context of an assessment of English speaking proficiency for non-native speakers. Rhetorical Structure Theory (RST) has been commonly used in the analysis of discourse organization of written texts; however, limited research has been conducted to date on RST annotation and parsing of spoken language, in particular, non-native spontaneous speech. Due to the fact that the measurement of discourse coherence is typically a key metric in human scoring rubrics for assessments of spoken language, we conducted research to obtain RST annotations on non-native spoken responses from a standardized assessment of academic English proficiency. Subsequently, automatic parsers were trained on these annotations to process non-native spontaneous speech. Finally, a set of features were extracted from automatically generated RST trees to evaluate the discourse structure of non-native spontaneous speech, which were then employed to further improve the validity of an automated speech scoring system.

pdf bib
The many dimensions of algorithmic fairness in educational applications
Anastassia Loukina | Nitin Madnani | Klaus Zechner
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

The issues of algorithmic fairness and bias have recently featured prominently in many publications highlighting the fact that training the algorithms for maximum performance may often result in predictions that are biased against various groups. Educational applications based on NLP and speech processing technologies often combine multiple complex machine learning algorithms and are thus vulnerable to the same sources of bias as other machine learning systems. Yet such systems can have high impact on people’s lives especially when deployed as part of high-stakes tests. In this paper we discuss different definitions of fairness and possible ways to apply them to educational applications. We then use simulated and real data to consider how test-takers’ native language backgrounds can affect their automated scores on an English language proficiency assessment. We illustrate that total fairness may not be achievable and that different definitions of fairness may require different solutions.

pdf bib
Toward Automated Content Feedback Generation for Non-native Spontaneous Speech
Su-Youn Yoon | Ching-Ni Hsieh | Klaus Zechner | Matthew Mulholland | Yuan Wang | Nitin Madnani
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this study, we developed an automated algorithm to provide feedback about the specific content of non-native English speakers’ spoken responses. The responses were spontaneous speech, elicited using integrated tasks where the language learners listened to and/or read passages and integrated the core content in their spoken responses. Our models detected the absence of key points considered to be important in a spoken response to a particular test question, based on two different models: (a) a model using word-embedding based content features and (b) a state-of-the art short response scoring engine using traditional n-gram based features. Both models achieved a substantially improved performance over the majority baseline, and the combination of the two models achieved a significant further improvement. In particular, the models were robust to automated speech recognition (ASR) errors, and performance based on the ASR word hypotheses was comparable to that based on manual transcriptions. The accuracy and F-score of the best model for the questions included in the train set were 0.80 and 0.68, respectively. Finally, we discussed possible approaches to generating targeted feedback about the content of a language learner’s response, based on automatically detected missing key points.

2018

pdf bib
Using exemplar responses for training and evaluating automated speech scoring systems
Anastassia Loukina | Klaus Zechner | James Bruno | Beata Beigman Klebanov
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

Automated scoring engines are usually trained and evaluated against human scores and compared to the benchmark of human-human agreement. In this paper we compare the performance of an automated speech scoring engine using two corpora: a corpus of almost 700,000 randomly sampled spoken responses with scores assigned by one or two raters during operational scoring, and a corpus of 16,500 exemplar responses with scores reviewed by multiple expert raters. We show that the choice of corpus used for model evaluation has a major effect on estimates of system performance with r varying between 0.64 and 0.80. Surprisingly, this is not the case for the choice of corpus for model training: when the training corpus is sufficiently large, the systems trained on different corpora showed almost identical performance when evaluated on the same corpus. We show that this effect is consistent across several learning algorithms. We conclude that evaluating the model on a corpus of exemplar responses if one is available provides additional evidence about system validity; at the same time, investing effort into creating a corpus of exemplar responses for model training is unlikely to lead to a substantial gain in model performance.

pdf bib
Atypical Inputs in Educational Applications
Su-Youn Yoon | Aoife Cahill | Anastassia Loukina | Klaus Zechner | Brian Riordan | Nitin Madnani
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

In large-scale educational assessments, the use of automated scoring has recently become quite common. While the majority of student responses can be processed and scored without difficulty, there are a small number of responses that have atypical characteristics that make it difficult for an automated scoring system to assign a correct score. We describe a pipeline that detects and processes these kinds of responses at run-time. We present the most frequent kinds of what are called non-scorable responses along with effective filtering models based on various NLP and speech processing technologies. We give an overview of two operational automated scoring systems —one for essay scoring and one for speech scoring— and describe the filtering models they use. Finally, we present an evaluation and analysis of filtering models used for spoken responses in an assessment of language proficiency.

2017

pdf bib
Discourse Annotation of Non-native Spontaneous Spoken Responses Using the Rhetorical Structure Theory Framework
Xinhao Wang | James Bruno | Hillary Molloy | Keelan Evanini | Klaus Zechner
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

The availability of the Rhetorical Structure Theory (RST) Discourse Treebank has spurred substantial research into discourse analysis of written texts; however, limited research has been conducted to date on RST annotation and parsing of spoken language, in particular, non-native spontaneous speech. Considering that the measurement of discourse coherence is typically a key metric in human scoring rubrics for assessments of spoken language, we initiated a research effort to obtain RST annotations of a large number of non-native spoken responses from a standardized assessment of academic English proficiency. The resulting inter-annotator kappa agreements on the three different levels of Span, Nuclearity, and Relation are 0.848, 0.766, and 0.653, respectively. Furthermore, a set of features was explored to evaluate the discourse structure of non-native spontaneous speech based on these annotations; the highest performing feature resulted in a correlation of 0.612 with scores of discourse coherence provided by expert human raters.

2015

pdf bib
Feature selection for automated speech scoring
Anastassia Loukina | Klaus Zechner | Lei Chen | Michael Heilman
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

2014

pdf bib
Automatic evaluation of spoken summaries: the case of language assessment
Anastassia Loukina | Klaus Zechner | Lei Chen
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Automated scoring of speaking items in an assessment for teachers of English as a Foreign Language
Klaus Zechner | Keelan Evanini | Su-Youn Yoon | Lawrence Davis | Xinhao Wang | Lei Chen | Chong Min Lee | Chee Wee Leong
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications

2013

pdf bib
Coherence Modeling for the Automated Assessment of Spontaneous Spoken Responses
Xinhao Wang | Keelan Evanini | Klaus Zechner
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Automated Content Scoring of Spoken Responses in an Assessment for Teachers of English
Klaus Zechner | Xinhao Wang
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Prompt-based Content Scoring for Automated Spoken Language Assessment
Keelan Evanini | Shasha Xie | Klaus Zechner
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

2012

pdf bib
Exploring Content Features for Automated Speech Scoring
Shasha Xie | Keelan Evanini | Klaus Zechner
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Using an Ontology for Improved Automated Content Scoring of Spontaneous Non-Native Speech
Miao Chen | Klaus Zechner
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

pdf bib
Vocabulary Profile as a Measure of Vocabulary Sophistication
Su-Youn Yoon | Suma Bhat | Klaus Zechner
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

2011

pdf bib
Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech
Miao Chen | Klaus Zechner
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Non-scorable Response Detection for Automated Speaking Proficiency Assessment
Su-Youn Yoon | Keelan Evanini | Klaus Zechner
Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications

2010

pdf bib
Using Amazon Mechanical Turk for Transcription of Non-Native Speech
Keelan Evanini | Derrick Higgins | Klaus Zechner
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

2009

pdf bib
Automatic Scoring of Children’s Read-Aloud Text Passages and Word Lists
Klaus Zechner | John Sabatini | Lei Chen
Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Improved pronunciation features for construct-driven assessment of non-native spontaneous speech
Lei Chen | Klaus Zechner | Xiaoming Xi
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2008

pdf bib
Towards Automatic Scoring of a Test of Spoken Language with Heterogeneous Task Types
Klaus Zechner | Xiaoming Xi
Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications

2006

pdf bib
Towards Automatic Scoring of Non-Native Spontaneous Speech
Klaus Zechner | Isaac Bejar
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf bib
Proceedings of the Analyzing Conversations in Text and Speech
Eduard Hovy | Klaus Zechner | Liang Zhou
Proceedings of the Analyzing Conversations in Text and Speech

2003

pdf bib
Efficient Optimization for Bilingual Sentence Alignment Based on Linear Regression
Bing Zhao | Klaus Zechner | Stephen Vogel | Alex Waibel
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

2002

pdf bib
Automatic Summarization of Open-Domain Multiparty Dialogues in Diverse Genres
Klaus Zechner
Computational Linguistics, Volume 28, Number 4, December 2002

2000

pdf bib
Minimizing Word Error Rate in Textual Summaries of Spoken Language
Klaus Zechner | Alex Waibel
1st Meeting of the North American Chapter of the Association for Computational Linguistics

pdf bib
DIASUMM: Flexible Summarization of Spontaneous Dialogues in Unrestricted Domains
Klaus Zechner | Alex Waibel
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

1998

pdf bib
Automatic Construction of Frame Representations for Spontaneous Speech in Unrestricted Domains
Klaus Zechner
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf bib
Using Chunk Based Partial Parsing of Spontaneous Speech in Unrestricted Domains for Reducing Word Error Rate in Speech Recognition
Klaus Zechner | Alex Waibel
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf bib
Automatic Construction of Frame Representations for Spontaneous Speech in Unrestricted Domains
Klaus Zechner
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf bib
Using Chunk Based Partial Parsing of Spontaneous Speech in Unrestricted Domains for Reducing Word Error Rate in Speech Recognition
Klaus Zechner | Alex Waibel
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

1997

pdf bib
High Performance Segmentation of Spontaneous Speech Using Part of Speech and Trigger Word Information
Marsal Gavalda | Klaus Zechner | Gregory Aist
Fifth Conference on Applied Natural Language Processing

1996

pdf bib
Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences
Klaus Zechner
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics