2021
pdf
bib
Advancing Neural Question Generation for Formal Pragmatics: Learning when to generate and when to copy
Kordula De Kuthy
|
Madeeswaran Kannan
|
Haemanth Santhi Ponnusamy
|
Detmar Meurers
Proceedings of the First Workshop on Integrating Perspectives on Discourse Annotation
pdf
bib
abs
Exploring Input Representation Granularity for Generating Questions Satisfying Question-Answer Congruence
Madeeswaran Kannan
|
Haemanth Santhi Ponnusamy
|
Kordula De Kuthy
|
Lukas Stein
|
Detmar Meurers
Proceedings of the 14th International Conference on Natural Language Generation
In question generation, the question produced has to be well-formed and meaningfully related to the answer serving as input. Neural generation methods have predominantly leveraged the distributional semantics of words as representations of meaning and generated questions one word at a time. In this paper, we explore the viability of form-based and more fine-grained encodings, such as character or subword representations for question generation. We start from the typical seq2seq architecture using word embeddings presented by De Kuthy et al. (2020), who generate questions from text so that the answer given in the input text matches not just in meaning but also in form, satisfying question-answer congruence. We show that models trained on character and subword representations substantially outperform the published results based on word embeddings, and they do so with fewer parameters. Our approach eliminates two important problems of the word-based approach: the encoding of rare or out-of-vocabulary words and the incorrect replacement of words with semantically-related ones. The character-based model substantially improves on the published results, both in terms of BLEU scores and regarding the quality of the generated question. Going beyond the specific task, this result adds to the evidence weighing different form- and meaning-based representations for natural language processing tasks.
2020
pdf
bib
abs
Towards automatically generating Questions under Discussion to link information and discourse structure
Kordula De Kuthy
|
Madeeswaran Kannan
|
Haemanth Santhi Ponnusamy
|
Detmar Meurers
Proceedings of the 28th International Conference on Computational Linguistics
Questions under Discussion (QUD; Roberts, 2012) are emerging as a conceptually fruitful approach to spelling out the connection between the information structure of a sentence and the nature of the discourse in which the sentence can function. To make this approach useful for analyzing authentic data, Riester, Brunetti & De Kuthy (2018) presented a discourse annotation framework based on explicit pragmatic principles for determining a QUD for every assertion in a text. De Kuthy et al. (2018) demonstrate that this supports more reliable discourse structure annotation, and Ziai and Meurers (2018) show that based on explicit questions, automatic focus annotation becomes feasible. But both approaches are based on manually specified questions. In this paper, we present an automatic question generation approach to partially automate QUD annotation by generating all potentially relevant questions for a given sentence. While transformation rules can concisely capture the typical question formation process, a rule-based approach is not sufficiently robust for authentic data. We therefore employ the transformation rules to generate a large set of sentence-question-answer triples and train a neural question generation model on them to obtain both systematic question type coverage and robustness.
2019
pdf
bib
abs
Annotating Information Structure in Italian: Characteristics and Cross-Linguistic Applicability of a QUD-Based Approach
Kordula De Kuthy
|
Lisa Brunetti
|
Marta Berardi
Proceedings of the 13th Linguistic Annotation Workshop
We present a discourse annotation study, in which an annotation method based on Questions under Discussion (QuD) is applied to Italian data. The results of our inter-annotator agreement analysis show that the QUD-based approach, originally spelled out for English and German, can successfully be transferred cross-linguistically, supporting good agreement for the annotation of central information structure notions such as focus and non-at-issueness. Our annotation and interannotator agreement study on Italian authentic data confirms the cross-linguistic applicability of the QuD-based approach.
pdf
bib
The Impact of Spelling Correction and Task Context on Short Answer Assessment for Intelligent Tutoring Systems
Ramon Ziai
|
Florian Nuxoll
|
Kordula De Kuthy
|
Björn Rudzewitz
|
Detmar Meurers
Proceedings of the 8th Workshop on NLP for Computer Assisted Language Learning
2018
pdf
bib
QUD-Based Annotation of Discourse Structure and Information Structure: Tool and Evaluation
Kordula De Kuthy
|
Nils Reiter
|
Arndt Riester
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
pdf
bib
abs
Generating Feedback for English Foreign Language Exercises
Björn Rudzewitz
|
Ramon Ziai
|
Kordula De Kuthy
|
Verena Möller
|
Florian Nuxoll
|
Detmar Meurers
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications
While immediate feedback on learner language is often discussed in the Second Language Acquisition literature (e.g., Mackey 2006), few systems used in real-life educational settings provide helpful, metalinguistic feedback to learners. In this paper, we present a novel approach leveraging task information to generate the expected range of well-formed and ill-formed variability in learner answers along with the required diagnosis and feedback. We combine this offline generation approach with an online component that matches the actual student answers against the pre-computed hypotheses. The results obtained for a set of 33 thousand answers of 7th grade German high school students learning English show that the approach successfully covers frequent answer patterns. At the same time, paraphrases and content errors require a more flexible alignment approach, for which we are planning to complement the method with the CoMiC approach successfully used for the analysis of reading comprehension answers (Meurers et al., 2011).
pdf
bib
Feedback Strategies for Form and Meaning in a Real-life Language Tutoring System
Ramon Ziai
|
Bjoern Rudzewitz
|
Kordula De Kuthy
|
Florian Nuxoll
|
Detmar Meurers
Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning
2017
pdf
bib
Developing a web-based workbook for English supporting the interaction of students and teachers
Björn Rudzewitz
|
Ramon Ziai
|
Kordula De Kuthy
|
Detmar Meurers
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition
2016
pdf
bib
Approximating Givenness in Content Assessment through Distributional Semantics
Ramon Ziai
|
Kordula De Kuthy
|
Detmar Meurers
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics
pdf
bib
Focus Annotation of Task-based Data: Establishing the Quality of Crowd Annotation
Kordula De Kuthy
|
Ramon Ziai
|
Detmar Meurers
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)
pdf
bib
abs
Focus Annotation of Task-based Data: A Comparison of Expert and Crowd-Sourced Annotation in a Reading Comprehension Corpus
Kordula De Kuthy
|
Ramon Ziai
|
Detmar Meurers
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
While the formal pragmatic concepts in information structure, such as the focus of an utterance, are precisely defined in theoretical linguistics and potentially very useful in conceptual and practical terms, it has turned out to be difficult to reliably annotate such notions in corpus data. We present a large-scale focus annotation effort designed to overcome this problem. Our annotation study is based on the tasked-based corpus CREG, which consists of answers to explicitly given reading comprehension questions. We compare focus annotation by trained annotators with a crowd-sourcing setup making use of untrained native speakers. Given the task context and an annotation process incrementally making the question form and answer type explicit, the trained annotators reach substantial agreement for focus annotation. Interestingly, the crowd-sourcing setup also supports high-quality annotation ― for specific subtypes of data. Finally, we turn to the question whether the relevance of focus annotation can be extrinsically evaluated. We show that automatic short-answer assessment significantly improves for focus annotated data. The focus annotated CREG corpus is freely available and constitutes the largest such resource for German.