Two Architectures for Parallel Processing of Huge Amounts of Text

Mathijs Kattenberg, Zuhaitz Beloki, Aitor Soroa, Xabier Artola, Antske Fokkens, Paul Huygen, Kees Verstoep


Abstract
This paper presents two alternative NLP architectures to analyze massive amounts of documents, using parallel processing. The two architectures focus on different processing scenarios, namely batch-processing and streaming processing. The batch-processing scenario aims at optimizing the overall throughput of the system, i.e., minimizing the overall time spent on processing all documents. The streaming architecture aims to minimize the time to process real-time incoming documents and is therefore especially suitable for live feeds. The paper presents experiments with both architectures, and reports the overall gain when they are used for batch as well as for streaming processing. All the software described in the paper is publicly available under free licenses.
Anthology ID:
L16-1714
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4513–4519
Language:
URL:
https://aclanthology.org/L16-1714/
DOI:
Bibkey:
Cite (ACL):
Mathijs Kattenberg, Zuhaitz Beloki, Aitor Soroa, Xabier Artola, Antske Fokkens, Paul Huygen, and Kees Verstoep. 2016. Two Architectures for Parallel Processing of Huge Amounts of Text. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4513–4519, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Two Architectures for Parallel Processing of Huge Amounts of Text (Kattenberg et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1714.pdf