Christian Heumann


2022

pdf bib
Pre-trained language models evaluating themselves - A comparative study
Philipp Koch | Matthias Aßenmacher | Christian Heumann
Proceedings of the Third Workshop on Insights from Negative Results in NLP

Evaluating generated text received new attention with the introduction of model-based metrics in recent years. These new metrics have a higher correlation with human judgments and seemingly overcome many issues of previous n-gram based metrics from the symbolic age. In this work, we examine the recently introduced metrics BERTScore, BLEURT, NUBIA, MoverScore, and Mark-Evaluate (Petersen). We investigate their sensitivity to different types of semantic deterioration (part of speech drop and negation), word order perturbations, word drop, and the common problem of repetition. No metric showed appropriate behaviour for negation, and further none of them was overall sensitive to the other issues mentioned above.

2021

pdf bib
Benchmarking down-scaled (not so large) pre-trained language models
Matthias Aßenmacher | Patrick Schulze | Christian Heumann
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)

pdf bib
How to Estimate Continuous Sentiments From Texts Using Binary Training Data
Sandra Wankmüller | Christian Heumann
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)