Ramon Ziai


2021

2019

2018

While immediate feedback on learner language is often discussed in the Second Language Acquisition literature (e.g., Mackey 2006), few systems used in real-life educational settings provide helpful, metalinguistic feedback to learners. In this paper, we present a novel approach leveraging task information to generate the expected range of well-formed and ill-formed variability in learner answers along with the required diagnosis and feedback. We combine this offline generation approach with an online component that matches the actual student answers against the pre-computed hypotheses. The results obtained for a set of 33 thousand answers of 7th grade German high school students learning English show that the approach successfully covers frequent answer patterns. At the same time, paraphrases and content errors require a more flexible alignment approach, for which we are planning to complement the method with the CoMiC approach successfully used for the analysis of reading comprehension answers (Meurers et al., 2011).
Analyzing language in context, both from a theoretical and from a computational perspective, is receiving increased interest. Complementing the research in linguistics on discourse and information structure, in computational linguistics identifying discourse concepts was also shown to improve the performance of certain applications, for example, Short Answer Assessment systems (Ziai and Meurers, 2014). Building on the research that established detailed annotation guidelines for manual annotation of information structural concepts for written (Dipper et al., 2007; Ziai and Meurers, 2014) and spoken language data (Calhoun et al., 2010), this paper presents the first approach automating the analysis of focus in authentic written data. Our classification approach combines a range of lexical, syntactic, and semantic features to achieve an accuracy of 78.1% for identifying focus.

2017

2016

While the formal pragmatic concepts in information structure, such as the focus of an utterance, are precisely defined in theoretical linguistics and potentially very useful in conceptual and practical terms, it has turned out to be difficult to reliably annotate such notions in corpus data. We present a large-scale focus annotation effort designed to overcome this problem. Our annotation study is based on the tasked-based corpus CREG, which consists of answers to explicitly given reading comprehension questions. We compare focus annotation by trained annotators with a crowd-sourcing setup making use of untrained native speakers. Given the task context and an annotation process incrementally making the question form and answer type explicit, the trained annotators reach substantial agreement for focus annotation. Interestingly, the crowd-sourcing setup also supports high-quality annotation ― for specific subtypes of data. Finally, we turn to the question whether the relevance of focus annotation can be extrinsically evaluated. We show that automatic short-answer assessment significantly improves for focus annotated data. The focus annotated CREG corpus is freely available and constitutes the largest such resource for German.

2015

2014

2013

2012

2011

2010