Tobias Horsmann

2019

From legal to technical concept: Towards an automated classification of German political Twitter postings as criminal offenses
Frederike Zufall | Tobias Horsmann | Torsten Zesch
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Advances in the automated detection of offensive Internet postings make this mechanism very attractive to social media companies, who are increasingly under pressure to monitor and action activity on their sites. However, these advances also have important implications as a threat to the fundamental right of free expression. In this article, we analyze which Twitter posts could actually be deemed offenses under German criminal law. German law follows the deductive method of the Roman law tradition based on abstract rules as opposed to the inductive reasoning in Anglo-American common law systems. This allows us to show how legal conclusions can be reached and implemented without relying on existing court decisions. We present a data annotation schema, consisting of a series of binary decisions, for determining whether a specific post would constitute a criminal offense. This schema serves as a step towards an inexpensive creation of a sufficient amount of data for an automated classification. We find that the majority of posts deemed offensive actually do not constitute a criminal offense and still contribute to public discourse. Furthermore, laymen can provide sufficiently reliable data to an expert reference but are, for instance, more lenient in the interpretation of what constitutes a disparaging statement.

pdf bib abs

ltl.uni-due at SemEval-2019 Task 5: Simple but Effective Lexico-Semantic Features for Detecting Hate Speech in Twitter
Huangpan Zhang | Michael Wojatzki | Tobias Horsmann | Torsten Zesch
Proceedings of the 13th International Workshop on Semantic Evaluation

In this paper, we present our contribution to SemEval 2019 Task 5 Multilingual Detection of Hate, specifically in the Subtask A (English and Spanish). We compare different configurations of shallow and deep learning approaches on the English data and use the system that performs best in both sub-tasks. The resulting SVM-based system with lexicosemantic features (n-grams and embeddings) is ranked 23rd out of 69 on the English data and beats the baseline system. On the Spanish data our system is ranked 25th out of 39.

pdf bib abs

LTL-UDE at SemEval-2019 Task 6: BERT and Two-Vote Classification for Categorizing Offensiveness
Piush Aggarwal | Tobias Horsmann | Michael Wojatzki | Torsten Zesch
Proceedings of the 13th International Workshop on Semantic Evaluation

We present results for Subtask A and C of SemEval 2019 Shared Task 6. In Subtask A, we experiment with an embedding representation of postings and use BERT to categorize postings. Our best result reaches the 10th place (out of 103). In Subtask C, we applied a two-vote classification approach with minority fallback, which is placed on the 19th rank (out of 65).

2018

pdf bib

DeepTC – An Extension of DKPro Text Classification for Fostering Reproducibility of Deep Learning Experiments
Tobias Horsmann | Torsten Zesch
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib abs

Do LSTMs really work so well for PoS tagging? – A replication study
Tobias Horsmann | Torsten Zesch
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

A recent study by Plank et al. (2016) found that LSTM-based PoS taggers considerably improve over the current state-of-the-art when evaluated on the corpora of the Universal Dependencies project that use a coarse-grained tagset. We replicate this study using a fresh collection of 27 corpora of 21 languages that are annotated with fine-grained tagsets of varying size. Our replication confirms the result in general, and we additionally find that the advantage of LSTMs is even bigger for larger tagsets. However, we also find that for the very large tagsets of morphologically rich languages, hand-crafted morphological lexicons are still necessary to reach state-of-the-art performance.

2016

pdf bib abs

FlexTag: A Highly Flexible PoS Tagging Framework
Torsten Zesch | Tobias Horsmann
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present FlexTag, a highly flexible PoS tagging framework. In contrast to monolithic implementations that can only be retrained but not adapted otherwise, FlexTag enables users to modify the feature space and the classification algorithm. Thus, FlexTag makes it easy to quickly develop custom-made taggers exactly fitting the research problem.

pdf bib

LTL-UDE @ EmpiriST 2015: Tokenization and PoS Tagging of Social Media Text
Tobias Horsmann | Torsten Zesch
Proceedings of the 10th Web as Corpus Workshop

pdf bib abs

Assigning Fine-grained PoS Tags based on High-precision Coarse-grained Tagging
Tobias Horsmann | Torsten Zesch
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We propose a new approach to PoS tagging where in a first step, we assign a coarse-grained tag corresponding to the main syntactic category. Based on this high-precision decision, in the second step we utilize specially trained fine-grained models with heavily reduced decision complexity. By analyzing the system under oracle conditions, we show that there is a quite large potential for significantly outperforming a competitive baseline. When we take error-propagation from the coarse-grained tagging into account, our approach is on par with the state of the art. Our approach also allows tailoring the tagger towards recognizing single word classes which are of interest e.g. for researchers searching for specific phenomena in large corpora. In a case study, we significantly outperform a standard model that also makes use of the same optimizations.