Giuseppe Abrami


pdf bib
Unlocking the Heterogeneous Landscape of Big Data NLP with DUUI
Alexander Leonhardt | Giuseppe Abrami | Daniel Baumartz | Alexander Mehler
Findings of the Association for Computational Linguistics: EMNLP 2023

Automatic analysis of large corpora is a complex task, especially in terms of time efficiency. This complexity is increased by the fact that flexible, extensible text analysis requires the continuous integration of ever new tools. Since there are no adequate frameworks for these purposes in the field of NLP, and especially in the context of UIMA, that are not outdated or unusable for security reasons, we present a new approach to address the latter task: Docker Unified UIMA Interface (DUUI), a scalable, flexible, lightweight, and feature-rich framework for automatic distributed analysis of text corpora that leverages Big Data experience and virtualization with Docker. We evaluate DUUI’s communication approach against a state-of-the-art approach and demonstrate its outstanding behavior in terms of time efficiency, enabling the analysis of big text data.


pdf bib
German Parliamentary Corpus (GerParCor)
Giuseppe Abrami | Mevlüt Bagci | Leon Hammerla | Alexander Mehler
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Parliamentary debates represent a large and partly unexploited treasure trove of publicly accessible texts. In the German-speaking area, there is a certain deficit of uniformly accessible and annotated corpora covering all German-speaking parliaments at the national and federal level. To address this gap, we introduce the German Parliamentary Corpus (GerParCor). GerParCor is a genre-specific corpus of (predominantly historical) German-language parliamentary protocols from three centuries and four countries, including state and federal level data. In addition, GerParCor contains conversions of scanned protocols and, in particular, of protocols in Fraktur converted via an OCR process based on Tesseract. All protocols were preprocessed by means of the NLP pipeline of spaCy3 and automatically annotated with metadata regarding their session date. GerParCor is made available in the XMI format of the UIMA project. In this way, GerParCor can be used as a large corpus of historical texts in the field of political communication for various tasks in NLP.

pdf bib
I still have Time(s): Extending HeidelTime for German Texts
Andy Luecking | Manuel Stoeckel | Giuseppe Abrami | Alexander Mehler
Proceedings of the Thirteenth Language Resources and Evaluation Conference

HeidelTime is one of the most widespread and successful tools for detecting temporal expressions in texts. Since HeidelTime’s pattern matching system is based on regular expression, it can be extended in a convenient way. We present such an extension for the German resources of HeidelTime: HeidelTimeExt. The extension has been brought about by means of observing false negatives within real world texts and various time banks. The gain in coverage is 2.7 % or 8.5 %, depending on the admitted degree of potential overgeneralization. We describe the development of HeidelTimeExt, its evaluation on text samples from various genres, and share some linguistic observations. HeidelTimeExt can be obtained from


pdf bib
Unleashing annotations with TextAnnotator: Multimedia, multi-perspective document views for ubiquitous annotation
Giuseppe Abrami | Alexander Henlein | Andy Lücking | Attila Kett | Pascal Adeberg | Alexander Mehler
Proceedings of the 17th Joint ACL - ISO Workshop on Interoperable Semantic Annotation

We argue that mainly due to technical innovation in the landscape of annotation tools, a conceptual change in annotation models and processes is also on the horizon. It is diagnosed that these changes are bound up with multi-media and multi-perspective facilities of annotation tools, in particular when considering virtual reality (VR) and augmented reality (AR) applications, their potential ubiquitous use, and the exploitation of externally trained natural language pre-processing methods. Such developments potentially lead to a dynamic and exploratory heuristic construction of the annotation process. With TextAnnotator an annotation suite is introduced which focuses on multi-mediality and multi-perspectivity with an interoperable set of task-specific annotation modules (e.g., for word classification, rhetorical structures, dependency trees, semantic roles, and more) and their linkage to VR and mobile implementations. The basic architecture and usage of TextAnnotator is described and related to the above mentioned shifts in the field.


pdf bib
Transfer of ISOSpace into a 3D Environment for Annotations and Applications
Alexander Henlein | Giuseppe Abrami | Attila Kett | Alexander Mehler
Proceedings of the 16th Joint ACL-ISO Workshop on Interoperable Semantic Annotation

People’s visual perception is very pronounced and therefore it is usually no problem for them to describe the space around them in words. Conversely, people also have no problems imagining a concept of a described space. In recent years many efforts have been made to develop a linguistic concept for spatial and spatial-temporal relations. However, the systems have not really caught on so far, which in our opinion is due to the complex models on which they are based and the lack of available training data and automated taggers. In this paper we describe a project to support spatial annotation, which could facilitate annotation by its many functions, but also enrich it with many more information. This is to be achieved by an extension by means of a VR environment, with which spatial relations can be better visualized and connected with real objects. And we want to use the available data to develop a new state-of-the-art tagger and thus lay the foundation for future systems such as improved text understanding for Text2Scene.

pdf bib
TextAnnotator: A UIMA Based Tool for the Simultaneous and Collaborative Annotation of Texts
Giuseppe Abrami | Manuel Stoeckel | Alexander Mehler
Proceedings of the Twelfth Language Resources and Evaluation Conference

The annotation of texts and other material in the field of digital humanities and Natural Language Processing (NLP) is a common task of research projects. At the same time, the annotation of corpora is certainly the most time- and cost-intensive component in research projects and often requires a high level of expertise according to the research interest. However, for the annotation of texts, a wide range of tools is available, both for automatic and manual annotation. Since the automatic pre-processing methods are not error-free and there is an increasing demand for the generation of training data, also with regard to machine learning, suitable annotation tools are required. This paper defines criteria of flexibility and efficiency of complex annotations for the assessment of existing annotation tools. To extend this list of tools, the paper describes TextAnnotator, a browser-based, multi-annotation system, which has been developed to perform platform-independent multimodal annotations and annotate complex textual structures. The paper illustrates the current state of development of TextAnnotator and demonstrates its ability to evaluate annotation quality (inter-annotator agreement) at runtime. In addition, it will be shown how annotations of different users can be performed simultaneously and collaboratively on the same document from different platforms using UIMA as the basis for annotation.


pdf bib
A UIMA Database Interface for Managing NLP-related Text Annotations
Giuseppe Abrami | Alexander Mehler
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
TreeAnnotator: Versatile Visual Annotation of Hierarchical Text Relations
Philipp Helfrich | Elias Rieb | Giuseppe Abrami | Andy Lücking | Alexander Mehler
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)