The xtsv Framework and the Twelve Virtues of Pipelines

Balázs Indig, Bálint Sass, Iván Mittelholcz


Abstract
We present xtsv, an abstract framework for building NLP pipelines. It covers several kinds of functionalities which can be implemented at an abstract level. We survey these features and argue that all are desired in a modern pipeline. The framework has a simple yet powerful internal communication format which is essentially tsv (tab separated values) with header plus some additional features. We put emphasis on the capabilities of the presented framework, for example its ability to allow new modules to be easily integrated or replaced, or the variety of its usage options. When a module is put into xtsv, all functionalities of the system are immediately available for that module, and the module can be be a part of an xtsv pipeline. The design also allows convenient investigation and manual correction of the data flow from one module to another. We demonstrate the power of our framework with a successful application: a concrete NLP pipeline for Hungarian called e-magyar text processing system (emtsv) which integrates Hungarian NLP tools in xtsv. All the advantages of the pipeline come from the inherent properties of the xtsv framework.
Anthology ID:
2020.lrec-1.871
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
7044–7052
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.871
DOI:
Bibkey:
Cite (ACL):
Balázs Indig, Bálint Sass, and Iván Mittelholcz. 2020. The xtsv Framework and the Twelve Virtues of Pipelines. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 7044–7052, Marseille, France. European Language Resources Association.
Cite (Informal):
The xtsv Framework and the Twelve Virtues of Pipelines (Indig et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.871.pdf