Elma Kerz


pdf bib
MANTIS at SMM4H’2022: Pre-Trained Language Models Meet a Suite of Psycholinguistic Features for the Detection of Self-Reported Chronic Stress
Sourabh Zanwar | Daniel Wiechmann | Yu Qiao | Elma Kerz
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

This paper describes our submission to Social Media Mining for Health (SMM4H) 2022 Shared Task 8, aimed at detecting self-reported chronic stress on Twitter. Our approach leverages a pre-trained transformer model (RoBERTa) in combination with a Bidirectional Long Short-Term Memory (BiLSTM) network trained on a diverse set of psycholinguistic features. We handle the class imbalance issue in the training dataset by augmenting it by another dataset used for stress classification in social media.

pdf bib
The Best of Both Worlds: Combining Engineered Features with Transformers for Improved Mental Health Prediction from Reddit Posts
Sourabh Zanwar | Daniel Wiechmann | Yu Qiao | Elma Kerz
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

In recent years, there has been increasing interest in the application of natural language processing and machine learning techniques to the detection of mental health conditions (MHC) based on social media data. In this paper, we aim to improve the state-of-the-art (SoTA) detection of six MHC in Reddit posts in two ways: First, we built models leveraging Bidirectional Long Short-Term Memory (BLSTM) networks trained on in-text distributions of a comprehensive set of psycholinguistic features for more explainable MHC detection as compared to black-box solutions. Second, we combine these BLSTM models with Transformers to improve the prediction accuracy over SoTA models. In addition, we uncover nuanced patterns of linguistic markers characteristic of specific MHC.

pdf bib
SPADE: A Big Five-Mturk Dataset of Argumentative Speech Enriched with Socio-Demographics for Personality Detection
Elma Kerz | Yu Qiao | Sourabh Zanwar | Daniel Wiechmann
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In recent years, there has been increasing interest in automatic personality detection based on language. Progress in this area is highly contingent upon the availability of datasets and benchmark corpora. However, publicly available datasets for modeling and predicting personality traits are still scarce. While recent efforts to create such datasets from social media (Twitter, Reddit) are to be applauded, they often do not include continuous and contextualized language use. In this paper, we introduce SPADE, the first dataset with continuous samples of argumentative speech labeled with the Big Five personality traits and enriched with socio-demographic data (age, gender, education level, language background). We provide benchmark models for this dataset to facilitate further research and conduct extensive experiments. Our models leverage 436 (psycho)linguistic features extracted from transcribed speech and speaker-level metainformation with transformers. We conduct feature ablation experiments to investigate which types of features contribute to the prediction of individual personality traits.

pdf bib
Pushing on Personality Detection from Verbal Behavior: A Transformer Meets Text Contours of Psycholinguistic Features
Elma Kerz | Yu Qiao | Sourabh Zanwar | Daniel Wiechmann
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis

Research at the intersection of personality psychology, computer science, and linguistics has recently focused increasingly on modeling and predicting personality from language use. We report two major improvements in predicting personality traits from text data: (1) to our knowledge, the most comprehensive set of theory-based psycholinguistic features and (2) hybrid models that integrate a pre-trained Transformer Language Model BERT and Bidirectional Long Short-Term Memory (BLSTM) networks trained on within-text distributions (‘text contours’) of psycholinguistic features. We experiment with BLSTM models (with and without Attention) and with two techniques for applying pre-trained language representations from the transformer model - ‘feature-based’ and ‘fine-tuning’. We evaluate the performance of the models we built on two benchmark datasets that target the two dominant theoretical models of personality: the Big Five Essay dataset (Pennebaker and King, 1999) and the MBTI Kaggle dataset (Li et al., 2018). Our results are encouraging as our models outperform existing work on the same datasets. More specifically, our models achieve improvement in classification accuracy by 2.9% on the Essay dataset and 8.28% on the Kaggle MBTI dataset. In addition, we perform ablation experiments to quantify the impact of different categories of psycholinguistic features in the respective personality prediction models.

pdf bib
Measuring the Impact of (Psycho-)Linguistic and Readability Features and Their Spill Over Effects on the Prediction of Eye Movement Patterns
Daniel Wiechmann | Yu Qiao | Elma Kerz | Justus Mattern
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

There is a growing interest in the combined use of NLP and machine learning methods to predict gaze patterns during naturalistic reading. While promising results have been obtained through the use of transformer-based language models, little work has been undertaken to relate the performance of such models to general text characteristics. In this paper we report on experiments with two eye-tracking corpora of naturalistic reading and two language models (BERT and GPT-2). In all experiments, we test effects of a broad spectrum of features for predicting human reading behavior that fall into five categories (syntactic complexity, lexical richness, register-based multiword combinations, readability and psycholinguistic word properties). Our experiments show that both the features included and the architecture of the transformer-based language models play a role in predicting multiple eye-tracking measures during naturalistic reading. We also report the results of experiments aimed at determining the relative importance of features from different groups using SP-LIME.


pdf bib
FANG-COVID: A New Large-Scale Benchmark Dataset for Fake News Detection in German
Justus Mattern | Yu Qiao | Elma Kerz | Daniel Wiechmann | Markus Strohmaier
Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER)

As the world continues to fight the COVID-19 pandemic, it is simultaneously fighting an ‘infodemic’ – a flood of disinformation and spread of conspiracy theories leading to health threats and the division of society. To combat this infodemic, there is an urgent need for benchmark datasets that can help researchers develop and evaluate models geared towards automatic detection of disinformation. While there are increasing efforts to create adequate, open-source benchmark datasets for English, comparable resources are virtually unavailable for German, leaving research for the German language lagging significantly behind. In this paper, we introduce the new benchmark dataset FANG-COVID consisting of 28,056 real and 13,186 fake German news articles related to the COVID-19 pandemic as well as data on their propagation on Twitter. Furthermore, we propose an explainable textual- and social context-based model for fake news detection, compare its performance to “black-box” models and perform feature ablation to assess the relative importance of human-interpretable features in distinguishing fake news from authentic news.

pdf bib
Language that Captivates the Audience: Predicting Affective Ratings of TED Talks in a Multi-Label Classification Task
Elma Kerz | Yu Qiao | Daniel Wiechmann
Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

The aim of the paper is twofold: (1) to automatically predict the ratings assigned by viewers to 14 categories available for TED talks in a multi-label classification task and (2) to determine what types of features drive classification accuracy for each of the categories. The focus is on features of language usage from five groups pertaining to syntactic complexity, lexical richness, register-based n-gram measures, information-theoretic measures and LIWC-style measures. We show that a Recurrent Neural Network classifier trained exclusively on within-text distributions of such features can reach relatively high levels of overall accuracy (69%) across the 14 categories. We find that features from two groups are strong predictors of the affective ratings across all categories and that there are distinct patterns of language usage for each rating category.

pdf bib
Automated Classification of Written Proficiency Levels on the CEFR-Scale through Complexity Contours and RNNs
Elma Kerz | Daniel Wiechmann | Yu Qiao | Emma Tseng | Marcus Ströbel
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications

Automatically predicting the level of second language (L2) learner proficiency is an emerging topic of interest and research based on machine learning approaches to language learning and development. The key to the present paper is the combined use of what we refer to as ‘complexity contours’, a series of measurements of indices of L2 proficiency obtained by a computational tool that implements a sliding window technique, and recurrent neural network (RNN) classifiers that adequately capture the sequential information in those contours. We used the EF-Cambridge Open Language Database (Geertzen et al. 2013) with its labelled Common European Framework of Reference (CEFR) levels (Council of Europe 2018) to predict six classes of L2 proficiency levels (A1, A2, B1, B2, C1, C2) in the assessment of writing skills. Our experiments demonstrate that an RNN classifier trained on complexity contours achieves higher classification accuracy than one trained on text-average complexity scores. In a secondary experiment, we determined the relative importance of features from four distinct categories through a sensitivity-based pruning technique. Our approach makes an important contribution to the field of automated identification of language proficiency levels, more specifically, to the increasing efforts towards the empirical validation of CEFR levels.


pdf bib
A Language-Based Approach to Fake News Detection Through Interpretable Features and BRNN
Yu Qiao | Daniel Wiechmann | Elma Kerz
Proceedings of the 3rd International Workshop on Rumours and Deception in Social Media (RDSM)

‘Fake news’ – succinctly defined as false or misleading information masquerading as legitimate news – is a ubiquitous phenomenon and its dissemination weakens the fact-based reporting of the established news industry, making it harder for political actors, authorities, media and citizens to obtain a reliable picture. State-of-the art language-based approaches to fake news detection that reach high classification accuracy typically rely on black box models based on word embeddings. At the same time, there are increasing calls for moving away from black-box models towards white-box (explainable) models for critical industries such as healthcare, finances, military and news industry. In this paper we performed a series of experiments where bi-directional recurrent neural network classification models were trained on interpretable features derived from multi-disciplinary integrated approaches to language. We apply our approach to two benchmark datasets. We demonstrate that our approach is promising as it achieves similar results on these two datasets as the best performing black box models reported in the literature. In a second step we report on ablation experiments geared towards assessing the relative importance of the human-interpretable features in distinguishing fake news from real news.

pdf bib
Understanding the Dynamics of Second Language Writing through Keystroke Logging and Complexity Contours
Elma Kerz | Fabio Pruneri | Daniel Wiechmann | Yu Qiao | Marcus Ströbel
Proceedings of the Twelfth Language Resources and Evaluation Conference

The purpose of this paper is twofold: [1] to introduce, to our knowledge, the largest available resource of keystroke logging (KSL) data generated by Etherpad (https://etherpad.org/), an open-source, web-based collaborative real-time editor, that captures the dynamics of second language (L2) production and [2] to relate the behavioral data from KSL to indices of syntactic and lexical complexity of the texts produced obtained from a tool that implements a sliding window approach capturing the progression of complexity within a text. We present the procedures and measures developed to analyze a sample of 14,913,009 keystrokes in 3,454 texts produced by 512 university students (upper-intermediate to advanced L2 learners of English) (95,354 sentences and 18,32,027 words) aiming to achieve a better alignment between keystroke-logging measures and underlying cognitive processes, on the one hand, and L2 writing performance measures, on the other hand. The resource introduced in this paper is a reflection of increasing recognition of the urgent need to obtain ecologically valid data that have the potential to transform our current understanding of mechanisms underlying the development of literacy (reading and writing) skills.

pdf bib
Becoming Linguistically Mature: Modeling English and German Children’s Writing Development Across School Grades
Elma Kerz | Yu Qiao | Daniel Wiechmann | Marcus Ströbel
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this paper we employ a novel approach to advancing our understanding of the development of writing in English and German children across school grades using classification tasks. The data used come from two recently compiled corpora: The English data come from the the GiC corpus (983 school children in second-, sixth-, ninth- and eleventh-grade) and the German data are from the FD-LEX corpus (930 school children in fifth- and ninth-grade). The key to this paper is the combined use of what we refer to as ‘complexity contours’, i.e. series of measurements that capture the progression of linguistic complexity within a text, and Recurrent Neural Network (RNN) classifiers that adequately capture the sequential information in those contours. Our experiments demonstrate that RNN classifiers trained on complexity contours achieve higher classification accuracy than one trained on text-average complexity scores. In a second step, we determine the relative importance of the features from four distinct categories through a Sensitivity-Based Pruning approach.


pdf bib
L2 Processing Advantages of Multiword Sequences: Evidence from Eye-Tracking
Elma Kerz | Arndt Heilmann | Stella Neumann
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

A substantial body of research has demonstrated that native speakers are sensitive to the frequencies of multiword sequences (MWS). Here, we ask whether and to what extent intermediate-advanced L2 speakers of English can also develop the sensitivity to the statistics of MWS. To this end, we aimed to replicate the MWS frequency effects found for adult native language speakers based on evidence from self-paced reading and sentence recall tasks in an ecologically more valid eye-tracking study. L2 speakers’ sensitivity to MWS frequency was evaluated using generalized linear mixed-effects regression with separate models fitted for each of the four dependent measures. Mixed-effects modeling revealed significantly faster processing of sentences containing MWS compared to sentences containing equivalent control items across all eyetracking measures. Taken together, these findings suggest that, in line with emergentist approaches, MWS are important building blocks of language and that similar mechanisms underlie both native and non-native language processing.

pdf bib
Understanding Vocabulary Growth Through An Adaptive Language Learning System
Elma Kerz | Andreas Burgdorf | Daniel Wiechmann | Stefan Meeger | Yu Qiao | Christian Kohlschein | Tobias Meisen
Proceedings of the 8th Workshop on NLP for Computer Assisted Language Learning


pdf bib
CoCoGen - Complexity Contour Generator: Automatic Assessment of Linguistic Complexity Using a Sliding-Window Technique
Ströbel Marcus | Elma Kerz | Daniel Wiechmann | Stella Neumann
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

We present a novel approach to the automatic assessment of text complexity based on a sliding-window technique that tracks the distribution of complexity within a text. Such distribution is captured by what we term “complexity contours” derived from a series of measurements for a given linguistic complexity measure. This approach is implemented in an automatic computational tool, CoCoGen – Complexity Contour Generator, which in its current version supports 32 indices of linguistic complexity. The goal of the paper is twofold: (1) to introduce the design of our computational tool based on a sliding-window technique and (2) to showcase this approach in the area of second language (L2) learning, i.e. more specifically, in the area of L2 writing.


pdf bib
Missing Generalizations: A Supervised Machine Learning Approach to L2 Written Production
Daniel Wiechmann | Elma Kerz
Proceedings of the 5th Workshop on Cognitive Aspects of Computational Language Learning (CogACLL)