Unlocking the Heterogeneous Landscape of Big Data NLP with DUUI

Alexander Leonhardt, Giuseppe Abrami, Daniel Baumartz, Alexander Mehler


Abstract
Automatic analysis of large corpora is a complex task, especially in terms of time efficiency. This complexity is increased by the fact that flexible, extensible text analysis requires the continuous integration of ever new tools. Since there are no adequate frameworks for these purposes in the field of NLP, and especially in the context of UIMA, that are not outdated or unusable for security reasons, we present a new approach to address the latter task: Docker Unified UIMA Interface (DUUI), a scalable, flexible, lightweight, and feature-rich framework for automatic distributed analysis of text corpora that leverages Big Data experience and virtualization with Docker. We evaluate DUUI’s communication approach against a state-of-the-art approach and demonstrate its outstanding behavior in terms of time efficiency, enabling the analysis of big text data.
Anthology ID:
2023.findings-emnlp.29
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
385–399
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.29
DOI:
10.18653/v1/2023.findings-emnlp.29
Bibkey:
Cite (ACL):
Alexander Leonhardt, Giuseppe Abrami, Daniel Baumartz, and Alexander Mehler. 2023. Unlocking the Heterogeneous Landscape of Big Data NLP with DUUI. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 385–399, Singapore. Association for Computational Linguistics.
Cite (Informal):
Unlocking the Heterogeneous Landscape of Big Data NLP with DUUI (Leonhardt et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.29.pdf