Daniel Baumartz


2023

pdf bib
Unlocking the Heterogeneous Landscape of Big Data NLP with DUUI
Alexander Leonhardt | Giuseppe Abrami | Daniel Baumartz | Alexander Mehler
Findings of the Association for Computational Linguistics: EMNLP 2023

Automatic analysis of large corpora is a complex task, especially in terms of time efficiency. This complexity is increased by the fact that flexible, extensible text analysis requires the continuous integration of ever new tools. Since there are no adequate frameworks for these purposes in the field of NLP, and especially in the context of UIMA, that are not outdated or unusable for security reasons, we present a new approach to address the latter task: Docker Unified UIMA Interface (DUUI), a scalable, flexible, lightweight, and feature-rich framework for automatic distributed analysis of text corpora that leverages Big Data experience and virtualization with Docker. We evaluate DUUI’s communication approach against a state-of-the-art approach and demonstrate its outstanding behavior in terms of time efficiency, enabling the analysis of big text data.

2018

pdf bib
FastSense: An Efficient Word Sense Disambiguation Classifier
Tolga Uslu | Alexander Mehler | Daniel Baumartz | Wahed Hemati
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
LTV: Labeled Topic Vector
Daniel Baumartz | Tolga Uslu | Alexander Mehler
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

In this paper we present LTV, a website and API that generates labeled topic classifications based on the Dewey Decimal Classification (DDC), an international standard for topic classification in libraries. We introduce nnDDC, a largely language-independent natural network-based classifier for DDC, which we optimized using a wide range of linguistic features to achieve an F-score of 87.4%. To show that our approach is language-independent, we evaluate nnDDC using up to 40 different languages. We derive a topic model based on nnDDC, which generates probability distributions over semantic units for any input on sense-, word- and text-level. Unlike related approaches, however, these probabilities are estimated by means of nnDDC so that each dimension of the resulting vector representation is uniquely labeled by a DDC class. In this way, we introduce a neural network-based Classifier-Induced Semantic Space (nnCISS).

2017

pdf bib
TextImager as a Generic Interface to R
Tolga Uslu | Wahed Hemati | Alexander Mehler | Daniel Baumartz
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

R is a very powerful framework for statistical modeling. Thus, it is of high importance to integrate R with state-of-the-art tools in NLP. In this paper, we present the functionality and architecture of such an integration by means of TextImager. We use the OpenCPU API to integrate R based on our own R-Server. This allows for communicating with R-packages and combining them with TextImager’s NLP-components.