Thomas Kleinbauer


2022

pdf bib
Exploiting Social Media Content for Self-Supervised Style Transfer
Dana Ruiter | Thomas Kleinbauer | Cristina España-Bonet | Josef van Genabith | Dietrich Klakow
Proceedings of the Tenth International Workshop on Natural Language Processing for Social Media

Recent research on style transfer takes inspiration from unsupervised neural machine translation (UNMT), learning from large amounts of non-parallel data by exploiting cycle consistency loss, back-translation, and denoising autoencoders. By contrast, the use of selfsupervised NMT (SSNMT), which leverages (near) parallel instances hidden in non-parallel data more efficiently than UNMT, has not yet been explored for style transfer. In this paper we present a novel Self-Supervised Style Transfer (3ST) model, which augments SSNMT with UNMT methods in order to identify and efficiently exploit supervisory signals in non-parallel social media posts. We compare 3ST with state-of-the-art (SOTA) style transfer models across civil rephrasing, formality and polarity tasks. We show that 3ST is able to balance the three major objectives (fluency, content preservation, attribute transfer accuracy) the best, outperforming SOTA models on averaged performance across their tested tasks in automatic and human evaluation.

pdf bib
Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online
Dana Ruiter | Liane Reiners | Ashwin Geet D’Sa | Thomas Kleinbauer | Dominique Fohr | Irina Illina | Dietrich Klakow | Christian Schemer | Angeliki Monnier
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Even though hate speech (HS) online has been an important object of research in the last decade, most HS-related corpora over-simplify the phenomenon of hate by attempting to label user comments as “hate” or “neutral”. This ignores the complex and subjective nature of HS, which limits the real-life applicability of classifiers trained on these corpora. In this study, we present the M-Phasis corpus, a corpus of ~9k German and French user comments collected from migration-related news articles. It goes beyond the “hate”-“neutral” dichotomy and is instead annotated with 23 features, which in combination become descriptors of various types of speech, ranging from critical comments to implicit and explicit expressions of hate. The annotations are performed by 4 native speakers per language and achieve high (0.77 <= k <= 1) inter-annotator agreements. Besides describing the corpus creation and presenting insights from a content, error and domain analysis, we explore its data characteristics by training several classification baselines.

2021

pdf bib
Modeling Profanity and Hate Speech in Social Media with Semantic Subspaces
Vanessa Hahn | Dana Ruiter | Thomas Kleinbauer | Dietrich Klakow
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

Hate speech and profanity detection suffer from data sparsity, especially for languages other than English, due to the subjective nature of the tasks and the resulting annotation incompatibility of existing corpora. In this study, we identify profane subspaces in word and sentence representations and explore their generalization capability on a variety of similar and distant target tasks in a zero-shot setting. This is done monolingually (German) and cross-lingually to closely-related (English), distantly-related (French) and non-related (Arabic) tasks. We observe that, on both similar and distant target tasks and across all languages, the subspace-based representations transfer more effectively than standard BERT representations in the zero-shot setting, with improvements between F1 +10.9 and F1 +42.9 over the baselines across all tested monolingual and cross-lingual scenarios.

pdf bib
Preventing Author Profiling through Zero-Shot Multilingual Back-Translation
David Adelani | Miaoran Zhang | Xiaoyu Shen | Ali Davody | Thomas Kleinbauer | Dietrich Klakow
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Documents as short as a single sentence may inadvertently reveal sensitive information about their authors, including e.g. their gender or ethnicity. Style transfer is an effective way of transforming texts in order to remove any information that enables author profiling. However, for a number of current state-of-the-art approaches the improved privacy is accompanied by an undesirable drop in the down-stream utility of the transformed data. In this paper, we propose a simple, zero-shot way to effectively lower the risk of author profiling through multilingual back-translation using off-the-shelf translation models. We compare our models with five representative text style transfer models on three datasets across different domains. Results from both an automatic and a human evaluation show that our approach achieves the best overall performance while requiring no training data. We are able to lower the adversarial prediction of gender and race by up to 22% while retaining 95% of the original utility on downstream tasks.

2019

pdf bib
Detection of Abusive Language: the Problem of Biased Datasets
Michael Wiegand | Josef Ruppenhofer | Thomas Kleinbauer
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We discuss the impact of data bias on abusive language detection. We show that classification scores on popular datasets reported in previous work are much lower under realistic settings in which this bias is reduced. Such biases are most notably observed on datasets that are created by focused sampling instead of random sampling. Datasets with a higher proportion of implicit abuse are more affected than datasets with a lower proportion.

2014

pdf bib
A Comparative Study of Weighting Schemes for the Interpretation of Spoken Referring Expressions
Su Nam Kim | Ingrid Zukerman | Thomas Kleinbauer | Masud Moshtaghi
Proceedings of the Australasian Language Technology Association Workshop 2014

2013

pdf bib
Evaluation of the Scusi? Spoken Language Interpretation System – A Case Study
Thomas Kleinbauer | Ingrid Zukerman | Su Nam Kim
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
A Noisy Channel Approach to Error Correction in Spoken Referring Expressions
Su Nam Kim | Ingrid Zukerman | Thomas Kleinbauer | Farshid Zavareh
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Error Detection in Automatic Speech Recognition
Farshid Zavareh | Ingrid Zukerman | Su Nam Kim | Thomas Kleinbauer
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)

2010

pdf bib
A Learning-based Sampling Approach to Extractive Summarization
Vishal Juneja | Sebastian Germesin | Thomas Kleinbauer
Proceedings of the NAACL HLT 2010 Student Research Workshop

2007

pdf bib
Combining Multiple Information Layers for the Automatic Generation of Indicative Meeting Abstracts
Thomas Kleinbauer | Stephanie Becker | Tilman Becker
Proceedings of the Eleventh European Workshop on Natural Language Generation (ENLG 07)