Albin Zehe


2021

pdf bib
The FairyNet Corpus - Character Networks for German Fairy Tales
David Schmidt | Albin Zehe | Janne Lorenzen | Lisa Sergel | Sebastian Düker | Markus Krug | Frank Puppe
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

This paper presents a data set of German fairy tales, manually annotated with character networks which were obtained with high inter rater agreement. The release of this corpus provides an opportunity of training and comparing different algorithms for the extraction of character networks, which so far was barely possible due to heterogeneous interests of previous researchers. We demonstrate the usefulness of our data set by providing baseline experiments for the automatic extraction of character networks, applying a rule-based pipeline as well as a neural approach, and find the neural approach outperforming the rule-approach in most evaluation settings.

pdf bib
Detecting Scenes in Fiction: A new Segmentation Task
Albin Zehe | Leonard Konle | Lea Katharina Dümpelmann | Evelyn Gius | Andreas Hotho | Fotis Jannidis | Lucas Kaufmann | Markus Krug | Frank Puppe | Nils Reiter | Annekea Schreiber | Nathalie Wiedmer
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

This paper introduces the novel task of scene segmentation on narrative texts and provides an annotated corpus, a discussion of the linguistic and narrative properties of the task and baseline experiments towards automatic solutions. A scene here is a segment of the text where time and discourse time are more or less equal, the narration focuses on one action and location and character constellations stay the same. The corpus we describe consists of German-language dime novels (550k tokens) that have been annotated in parallel, achieving an inter-annotator agreement of gamma = 0.7. Baseline experiments using BERT achieve an F1 score of 24%, showing that the task is very challenging. An automatic scene segmentation paves the way towards processing longer narrative texts like tales or novels by breaking them down into smaller, coherent and meaningful parts, which is an important stepping stone towards the reconstruction of plot in Computational Literary Studies but also can serve to improve tasks like coreference resolution.

2020

pdf bib
Improving Sentiment Analysis with Biofeedback Data
Daniel Schlör | Albin Zehe | Konstantin Kobs | Blerta Veseli | Franziska Westermeier | Larissa Brübach | Daniel Roth | Marc Erich Latoschik | Andreas Hotho
Proceedings of LREC2020 Workshop "People in language, vision and the mind" (ONION2020)

Humans frequently are able to read and interpret emotions of others by directly taking verbal and non-verbal signals in human-to-human communication into account or to infer or even experience emotions from mediated stories. For computers, however, emotion recognition is a complex problem: Thoughts and feelings are the roots of many behavioural responses and they are deeply entangled with neurophysiological changes within humans. As such, emotions are very subjective, often are expressed in a subtle manner, and are highly depending on context. For example, machine learning approaches for text-based sentiment analysis often rely on incorporating sentiment lexicons or language models to capture the contextual meaning. This paper explores if and how we further can enhance sentiment analysis using biofeedback of humans which are experiencing emotions while reading texts. Specifically, we record the heart rate and brain waves of readers that are presented with short texts which have been annotated with the emotions they induce. We use these physiological signals to improve the performance of a lexicon-based sentiment classifier. We find that the combination of several biosignals can improve the ability of a text-based classifier to detect the presence of a sentiment in a text on a per-sentence level.

pdf bib
Where to Submit? Helping Researchers to Choose the Right Venue
Konstantin Kobs | Tobias Koopmann | Albin Zehe | David Fernes | Philipp Krop | Andreas Hotho
Findings of the Association for Computational Linguistics: EMNLP 2020

Whenever researchers write a paper, the same question occurs: “Where to submit?” In this work, we introduce WTS, an open and interpretable NLP system that recommends conferences and journals to researchers based on the title, abstract, and/or keywords of a given paper. We adapt the TextCNN architecture and automatically analyze its predictions using the Integrated Gradients method to highlight words and phrases that led to the recommendation of a scientific venue. We train and test our method on publications from the fields of artificial intelligence (AI) and medicine, both derived from the Semantic Scholar dataset. WTS achieves an Accuracy@5 of approximately 83% for AI papers and 95% in the field of medicine. It is open source and available for testing on https://wheretosubmit.ml.

2019

pdf bib
Team Xenophilius Lovegood at SemEval-2019 Task 4: Hyperpartisanship Classification using Convolutional Neural Networks
Albin Zehe | Lena Hettinger | Stefan Ernst | Christian Hauptmann | Andreas Hotho
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes our system for the SemEval 2019 Task 4 on hyperpartisan news detection. We build on an existing deep learning approach for sentence classification based on a Convolutional Neural Network. Modifying the original model with additional layers to increase its expressiveness and finally building an ensemble of multiple versions of the model, we obtain an accuracy of 67.52% and an F1 score of 73.78% on the main test dataset. We also report on additional experiments incorporating handcrafted features into the CNN and using it as a feature extractor for a linear SVM.

2018

pdf bib
ClaiRE at SemEval-2018 Task 7: Classification of Relations using Embeddings
Lena Hettinger | Alexander Dallmann | Albin Zehe | Thomas Niebler | Andreas Hotho
Proceedings of The 12th International Workshop on Semantic Evaluation

In this paper we describe our system for SemEval-2018 Task 7 on classification of semantic relations in scientific literature for clean (subtask 1.1) and noisy data (subtask 1.2). We compare two models for classification, a C-LSTM which utilizes only word embeddings and an SVM that also takes handcrafted features into account. To adapt to the domain of science we train word embeddings on scientific papers collected from arXiv.org. The hand-crafted features consist of lexical features to model the semantic relations as well as the entities between which the relation holds. Classification of Relations using Embeddings (ClaiRE) achieved an F1 score of 74.89% for the first subtask and 78.39% for the second.