Thomas Huber
2024
Let’s discuss! Quality Dimensions and Annotated Datasets for Computational Argument Quality Assessment
Rositsa V Ivanova
|
Thomas Huber
|
Christina Niklaus
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Research in the computational assessment of Argumentation Quality has gained popularity over the last ten years. Various quality dimensions have been explored through the creation of domain-specific datasets and assessment methods. We survey the related literature (211 publications and 32 datasets), while addressing potential overlaps and blurry boundaries to related domains. This paper provides a representative overview of the state of the art in Computational Argument Quality Assessment with a focus on quality dimensions and annotated datasets. The aim of the survey is to identify research gaps and to aid future discussions and work in the domain.
2023
Enhancing Educational Dialogues: A Reinforcement Learning Approach for Generating AI Teacher Responses
Thomas Huber
|
Christina Niklaus
|
Siegfried Handschuh
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)
Reinforcement Learning remains an underutilized method of training and fine-tuning Language Models (LMs) despite recent successes. This paper presents a simple approach of fine-tuning a language model with Reinforcement Learning to achieve competitive performance on the BEA 2023 Shared Task whose goal is to automatically generate teacher responses in educational dialogues. We utilized the novel NLPO algorithm that masks out tokens during generation to direct the model towards generations that maximize a reward function. We show results for both the t5-base model with 220 million parameters from the HuggingFace repository submitted to the leaderboard that, despite its comparatively small size, has achieved a good performance on both test and dev set, as well as GPT-2 with 124 million parameters. The presented results show that despite maximizing only one of the metrics used in the evaluation as a reward function our model scores highly in the other metrics as well.