Unlocking the Heterogeneous Landscape of Big Data NLP with DUUI

,


Introduction
Automatic analysis of text corpora is an important task in many research areas that use Natural Language Processing (NLP).This includes fields as diverse as digital humanities (e.g.Brooke et al., 2015), economics (e.g.Qureshi et al., 2022), or biodiversity research (Driller et al., 2020).All these areas are supported by the availability of increasingly large text repositories such as newspaper corpora (e.g.New York Times, 2019), parliamentary corpora (e.g.Barbaresi, 2018;Rauh and Schwalbach, 2020;Abrami et al., 2022) or Wikipedia (Pasternack and Roth, 2008).However, text processing of large corpora is time-consuming, resource-intensive, and thus costly.In addition, NLP routines are required as pre-processing steps to perform even more time-consuming tasks as, e.g., textual entailment (Paramasivam and Nirmala, 2021), rhetorical analysis (Joty et al., 2015) or argument mining (Ding et al., 2022).In any event, the application of NLP methods is gaining acceptance in almost all text-based disciplines.This increasingly leads to scenarios in which ever larger corpora have to be processed regarding ever more complex tasks while ensuring a mixed methods approach (Johnson et al., 2007).In contrast, project participants, especially those outside the NLP field, often have limited computer skills.Thus, a suitable framework for flexible, transparent, and timeoptimized pre-processing of large corpora is essential, allowing the integration of ever new analysis methods.For this purpose, software is required, which serves the following functions: (A) Horizontal and vertical scaling: Since time is a significant factor, pre-processing must be horizontally and vertically scalable.That is, pre-processing tools should be deployable on different systems, regardless of whether they are on servers or workstations.This is what we call horizontal scaling.In addition, resource-intensive processes should be automatically delegated to resource-rich machines to enable their parallelization.This is what we call vertical scaling.Increasingly large corpora demand similar scaling properties of NLP (Divita et al., 2015;Kim et al., 2021).
(B) Capturing heterogeneous annotation landscapes: For the various pre-processing tasks (e.g., tokenization, lemmatization, PoS tagging, parsing, coreference analysis, NER etc.), there are varieties of tools (e.g.Stanford NLP (Manning et al., 2014) and spaCy (Honnibal et al., 2020)), that use proprietary annotation formats, especially when there are no annotation standards for the task.Therefore, in order to use the results of such tools and establish their comparability, it is necessary to do so in a common environment that abstracts from the underlying output formats.
(C) Capturing heterogeneous implementation landscapes: Competing tools for the same task may be implemented in different programming languages that impose different requirements on the runtime behavior of these tools (Bhatnagar et al., 2022).Thus, such tools should be reusable simultaneously and independently of the platform and programming language.At the same time, tools may not be available locally, but only, e.g., via an API (e.g., RESTful, WebSocket).This in turn leads to the requirement that a suitable NLP platform encapsulates the heterogeneity of different implementation landscapes.
(D) Reproducible & reusable annotations: Evaluating automatically generated annotations requires tracking the engines or pipelines that created them.This is important, e.g., if the data is to be shared among scientists.Thus, any application of an NLP pipeline should be reproducible, just as these pipelines should be reusable and modifiable (Trisovic et al., 2020;Jupyter et al., 2018;Cacho and Taghva, 2020).
It should be possible to track pipelines at the level of individual documents or sequences thereof.As shown by (Leonhardt, 2022), this approach can induce a considerable overhead in terms of memory usage and performance to be managed by the NLP platform.
(E) Monitoring and error-reporting: Running NLP routines may fail.The reasons for such failures range from errors in the input data, missing information, platform peculiarities to programming errors.Thus, a powerful monitoring and error reporting system is needed to troubleshoot and monitor processing statuses.
(F) Lightweight usability: To utilize the wide range of NLP methods, knowledge of computer science and programming is required.
In particular, the maintenance and further development of such tools requires a high level of expertise.This results in two requirements: in terms of ease of use and maintenance of the entire infrastructure, and the ease with which tools can be integrated or updated.
Currently, there is no framework in NLP that meets criteria A-F.Moreover, the number of usable frameworks has recently decreased significantly due to Log4j vulnerability (Hiesgen et al., 2022), which occurred in 2021 (see, e.g., the Apache framework DUCC).Text corpora are therefore increasingly processed with proprietary systems, which increase development effort and are ultimately not reusable.We present an approach that meets requirements A-F to overcome this trend: DOCKER UNIFIED UIMA INTERFACE (DUUI), an efficient, horizontally and vertically scalable framework that handles big text data in a simple, modular, and reusable manner, while mitigating annotation-and implementation-related heterogeneities.Similar to TEXTIMAGER (Hemati et al., 2016), DUUI is based on UIMA (Unstructured Information Management Applications) (Ferrucci et al., 2009).UIMA is used in numerous projects and remains an active area of research, notwithstanding the shortcomings noted by There are many proprietary NLP frameworks (see Section 2).DUUI accounts for this heterogeneity: it integrates these frameworks, supports interfaces, and stores all generated annotations and analysis results in UIMA.UIMA allows for defining software modules for the analysis of unstructured data (e.g., texts).This is done with so-called Analysis Engines (AE).The results of an analysis are stored in a Common Analysis Structure (CAS) (Götz and Suhre, 2004) with reference to a defined schema description (Type System Descriptors).CAS is the basic data structure in which data is analyzed and stored with metadata.To process data, each AE receives a CAS object, performs its analyses, and passes the revised CAS object to the next AE.Thereby, each pre-processing step comprises one to several pipelines, which are encapsulated and managed by DUUI.
In this paper, we describe the principles and implementation of DUUI.We evaluate its efficiency and show that DUUI meets the criteria A-F.In Section 2, we present related work and introduce DUUI in Section 3. In Section 4, we present an evaluation based on benchmark studies and address issues regarding annotation frameworks.We conclude with an outlook on future work (Section 5) and a conclusion (Section 6).

Related Work
DUUI's use cases are related to various research areas, including high-performance computing (HPC) and reproducible pipelines.
Several approaches exist to improve application scalability based on UIMA.One of the most common approaches to scale AEs, the Apache Hadoop cluster (Apache, 2006), is used by Exner and Nugues (2014); Zorn et al. (2013); Nesi et al. (2015); Aydin and Hallac (2018).It works well for simple pipelines that are decomposable into map reduce jobs, but has problems with more complex scenarios.Scenarios regarding routines that do not entirely fit into main memory are particularly difficult to model with an Apache Hadoop cluster.In addition, detecting and fixing Out Of Memory (OOM) issues appears to be a recurring problem in Apache Hadoop (Xu et al., 2015).Further, complex AEs that combine multiple aggregations after initial information extraction are difficult to accomplish (Götz et al., 2014).An alternative approach is proposed by Hodapp et al. (2016), which follows a similar microservice-centric approach as DUUI.
Despite this similarity, DUUI offers a considerably simpler setup that does not require the laborious setup of an Apache Message Broker.
An alternative approach based on microservices is the Computation Flow Orchestrator (CFO) (Chakravarti et al., 2019).It uses technologies, such as Docker and Kubernetes 2 , and Google's Buffer Interface Definition Language protocol.CFO organizes analyses as flows and distributes them to individual nodes addressed by an orchestrator, which generates a set of startup scripts and REST interfaces using Docker in preparation.
In this way one gets access to the individual endpoints and to debug information.This approach is similar to DUUI, but requires time-consuming learning of the scripting language, while native UIMA integration is not directly possible -a problem for many frameworks.Further, its setup is not straightforward, as profound knowledge of Docker and the underlying network bridge is required; alternatively, the makefile would have to be changed to allow the system to be compiled without Docker.Another approach comes from Agerri et al. (2015) and Arshi Saloot and Nghia Pham (2021).It is based on Apache Storm, which also enables distributed processing of texts via microservices.GATE Cloud (Tablan et al., 2013) allows for dis-2 https://kubernetes.iotributing processes with UIMA components.But since GATE is a large and complex system, its use requires a significant amount of learning.
The Data-Centric Framework (Liu et al., 2020) offers a more complete workflow, which allows for pre-processing, visualization and post-annotation.It generalizes the UIMA schema to map NLP data in a Python environment based on Forte3 .This solution lacks the ability to split pre-processing across servers and is neither platform nor programming language independent.In addition, its proprietary use of UIMA makes reuse difficult.
A similar tool is Flyte4 , implemented in Golang and available for Python, Java, and Scala, which distributes pre-processing to Kubernetes containers.This rather complex framework shares some features with DUUI, but getting started and setting up are challenging.While it is positive that data formats are not predefined, it makes subsequent use in existing frameworks more time-consuming.This also holds for argo5 , which also uses Kubernetes to realize the mapping and running of workflows.
Using criteria A-F from Section 1, these frameworks can be compared with each other, as shown in Table 1.It should be noted that is difficult to compare the use and effort required to maintain or reuse the components of these frameworks (Criterion F).However, in addition to providing powerful NLP methods, a desired framework should be easy to use, especially for non-experts.Since there was no framework that meets all the criteria from A to F, we developed DUUI to fill this gap.DUUI enables the implementation of these criteria by using UIMA as annotation standard.Thus, documents annotated in UIMA can be reused without DUUI.However, feature D concerning reproducibility, which is fulfilled by DUUI, goes further: it refers to the ability to reconstruct and reuse all annotations performed, including analysis engines, models, and settings.
3 Docker Unified UIMA Interface The problems described so far cause that the implementation of all approaches from Section 2 is time-consuming and ultimately makes their maintenance impossible.Most of these frameworks meets more than half of the requirements A-F from Section 1 (cf.Table 1) for NLP based on Big Data.
To overcome this shortcoming, DUUI integrates functions in a modular way, making its operation reproducible and usable by non-experts.It enables source-text reduced programming and is mainly based on Docker containers, while allowing for integrating other systems (see Figure 1).DUUI uses so-called COMPOSERs, possibly connected to a database to enable dynamic monitoring, which consist of n DRIVERs, each containing up to m COM-PONENTs, each of which encapsulates an Analysis Engine (AE).Since this cross-programminglanguage system bridges functions given their heterogeneous implementations, it is reasonable to create a unified interface for Docker containers to enable cross-implementation operations.This includes two aspects, as Docker allows individual containers to use different programming languages (Section 1, Feature C) and taggers (Feature B).Beyond that, processes can also be executed in the local Java Runtime Environment (JRE) as shown in Figure 2.

Composer
The role of the COMPOSER is similar to the role of the master in the Map Reduce Framework (Dean and Ghemawat, 2004).It initializes all DRIVERs with their respective COMPONENTs, controls the document flow through the pipeline and reports metrics to the database.The COMPOSER also manages the global Lua (Ierusalimschy et al., 2007) state, providing a shared set of global libraries and security policies to the underlying DRIVERs.Lua is used to realize programming language independent communication between the individual AEs (see Section 3.4).By taking control over the error handling and the advanced control flow mechanisms, every pipeline can have a policy fit to its task.The , is part of a Java application: a composer is instantiated containing a set of drivers and associated components.Several pipelines are registered as components in corresponding drivers.Each driver defines the communication between the implementations of the components.UIMA and Remote are executed in Java Runtime Environment ( ).Components can also be run as Docker containers, that is, as DOCKER DRIVER or SWARM DRIVER.In the first driver, services are used as Docker images running on the local system as Docker containers to provide vertical scaling.The second driver also uses Docker images, but is distributed in a Swarm network of attached Docker nodes to provide horizontal scaling.For security reasons, using Docker Swarm is only possible if the pipeline is initially executed on the Docker leader node.ble 1, Feature E).For this purpose, an SQLite database and a Docker connection via ArangoDB6 or InfluxDB7 are available.Due to our interface design, other database management systems can be added.This eases troubleshooting and performance analyses.Further, a Docker image instantiated in Swarm mode can be used to overview the workload on the Swarm network.

Driver
A DRIVER instantiates, orchestrates and manages a set of COMPONENTs.Each DRIVER serves as a bridge to a specific functionality, thereby giving almost native access to the underlying software.This enables advanced use cases, e.g., scheduling containers on specific nodes or restricting system resources for specific parts of a pipeline.Because DRIVERs are designed as interfaces, further application environments can be included extending the given ones.DUUI currently includes four predefined DRIVERs to scale NLP processes horizontally and vertically while running COMPONENTs on the local machine as well within Docker swarm: 1.The UIMA DRIVER runs a UIMA AE on the local machine (using local memory and processor) in the same process within the JRE and allows scaling on that machine by replicating the underlying AE (Feature A in Section 1).This enables the use of all previous analysis methods based on UIMA AE without further adjustments.
2 To avoid continuous reloading, it may be necessary to start a service once or twice in a dedicated mode and then use a REMOTE DRIVER to access it.To use services, their URL must be specified to enable horizontal scaling.
DRIVERs can be added to integrate processes in other runtime environments (see Section 5).The encapsulation of the implementation of the AEs is done by means of their COMPONENTs.

Component
A COMPONENT represents an AE according to UIMA and is defined by the instantiating DRIVER which imposes requirements on its COMPONENT.The implementations of the DRIVERs differ, with UIMA DRIVER being closest to the AEs of UIMA.The latter instantiates and uses its COMPONENT based solely on the associated AE, so no modification to an existing AE is required.Therefore, existing AEs implemented in Java can be used directly in DUUI, although scaling is limited to the vertical level due to the limitation of execution in the JRE.Unlike UIMA DRIVER, COMPONENTs of all other DRIVERs (see Section 3.2 -Drivers) must follow the RESTful scheme of DUUI.

Communication
Given heterogeneous programming languages (Section 1), there is a need for a method for the communication of COMPONENTs that is fast, flexible, and error tolerant.The communication within DUUI is designed to integrate a variety of programming languages, even without direct access to the UIMA framework.This requires a transformer that converts UIMA documents so that they can be processed by the programming language of the respective COMPONENT.In addition, partial serialization of documents is required because the size of UIMA documents increases rapidly during preprocessing.To meet these requirements, we use Lua as an embedded scripting language for communication.Through fine-grained access and interoperability within a Java Virtual Machine (JVM), Lua allows annotators to precisely define elements of the documents to be processed, and to use all communication formats that Lua or Java support.This flexibility is exemplified by implementations of annotators in Java, Python, and Rust using various formats such as MessagePack, JSON, UIMA XMI and UIMA Binary.Figure 12 and 13 (appendix), compare this in detail.By using Lua (see Figure 15 (appendix)), DUUI meets Feature B and C.

Lua Sandbox
Although the potential of Lua is outstanding, one aspect deserves special attention because it can harm the host system.A Lua script allows the client COMPONENT to execute invoked methods without verification by the host system, allowing unrestricted access to the host.To solve this problem, we implemented a "sandbox" that limits the number of statements executed by the Lua interpreter before it aborts execution.The sandbox works like a class loader, restricting Lua calls and allowing only a certain number of named classes.This means that the COMPOSER (Figure 1) can be equipped with a firewall (the so-called sandbox) that restricts LUA scripts to use only authorized classes like a white list.Therefore all other classes on the host system (COMPOSER) are not accessible (as shown in Figure 14 in lines 07-10).

Reproducibility
DUUI fulfills Feature D: reproducibility at document level.Each pipeline component is fully serializable and annotates each processed document.The reproducibility level of a component depends on its type: components of UIMA DRIVER and RE-MOTE DRIVER are less reproducible than the components of other drivers.For REMOTE DRIVER, this is partly because the endpoint providing the annotation service cannot be controlled.Because of this design, users can provide services that are copyrighted or otherwise legally restricted without having to redistribute the software.Despite the specification of the full COMPONENT in the processed document, different package managers and the inability of Java to provide source code along with the package version mean that the respective pipelines cannot be reproduced without the help of the user.DOCKER DRIVER and REMOTE DRIVER overcome these limitations by enabling pipeline replication across multiple hosts and environments.

DUUI-integration
The goal of DUUI is the straightforward integration of NLP routines on three levels: (1) Operation of existing and new AEs without the need to integrate a new library (e.g.Flyte, argo) or build a more complex infrastructure (e.g.Flyte, GATE, CFO) and integrate AEs with different programming languages.(2) Integrate simple to complex AEs by connecting self-contained tools with DUUI, which provides a set of existing components in Docker images8 each routine runs as usual, with its results returned to the COMPOSER only via REST to keep the overall integration manageable.
(3) Using DUUI within existing infrastructures requires that they use XMI as an exchange format; otherwise, existing readers or writers (e.g.CoNLL, JSON, etc.) must be used.The use is directly possible via Java or a terminal call by passing documents to be processed.This is certainly the most complex solution compared to the previous points, but it can be encapsulated by a web API.In any event, DUUI can be implemented and used with manageable to low effort, as required by Feature F. All DUUI libraries can be included via Maven (see Figure 14 in the appendix for an example setup).

Evaluation
We conducted several tests to show the feasibility of DUUI.Besides the projects listed in Section 2, there is only one project that is not implemented in Java and enables UIMA processing directly in Python: dkpro-cassis (Klie and de Castilho, 2020).As tools are increasingly developed exclusively in Python, the use of a native Python library is important for UIMA, and its performance characteristics are a key basis for comparison with implementations in non-native (e.g., non-Java) programming languages.We computed benchmarks for the Ger-man parliamentary corpus (Barbaresi, 2018) and the largest corpus of German parliamentary minutes, GerParCor (Abrami et al., 2022), which contains 37 881 with spaCy annotated minutes from three centuries.At first sight, the subset relation of both corpora seems odd, but their individual processing is an important indicator, as will be shown.

Serialization
For evaluating the serialization speed between Java and Python (using dkpro-cassis as a reference) all texts in the test corpus are processed with spaCy and BreakIteratorSegmenter.To this end, we use the Barbaresi (2018) corpus, since GerParCor (due to its coverage period) contains characters that are not XML 1.0 conform and thus cannot be processed by dkpro-cassis.Figure 3 and 4 show that the serialization speed is linear for the Java CAS implementation and linear with a tendency to be polynomial for dkpro-cassis.In any event, the former outperforms the latter by a large margin (see Table 2 and Figure 5).dkpro-cassis' serialization performance quickly degrades when handling larger documents (see Figure 6).Moreover, its mandatory use of XML 1.0 limits the processable character set in contrast to the Lua communication of DUUI.
Using UIMA documents directly in Python via dkpro-cassis is a step ahead.But since more and more NLP tools are implemented in Python, a (de-)serialization in this language leads to a significant increase in runtime depending on document size.

Partial serialization
Often, annotators need only a subset of all available annotations.Therefore, it is important to allow partial serializations of a document to save bandwidth and CPU cycles.Since we introduced the Lua bridge for communication between annotators of different programming languages, we next compared document (de-)serialization between Java, Python and Rust annotators: To this end, we developed the first annotator in Rust that is usable with a UIMA pipeline.The task was to serialize all tokens from GerParCor.The reason for this choice was  that token annotations account for about one-third of all annotations in documents.Figure 7 shows a clear disadvantage of the Python implementation compared to Java and Rust, whose times are linear to the number of annotations.

Bottleneck
COMPOSERs execute Lua code to transform CAS documents into a domain processable by the respective annotator and transform the annotator's response back into the CAS domain.This creates a bottleneck, as this conversion is currently performed on the executing system.We use Amdahl's law (Rodgers, 1985) to investigate this bottleneck: It describes a program's speedup with two parameters: the fraction p of the program that is parallelizable and the number s of concurrent executors working on the parallelizable fraction of the program.We process GerParCor with the spaCy-COMPONENT and deduce the parallelizable frac-  tion of it from the mean time spent in the COM-PONENT according the total time for the COMPO-NENT to run. Figure 16 in the appendix shows the parameter space and the balance between the computational power of the computer running the COMPOSER and the number of instances that can be distributed over any available network.

DUUI benchmarks
The major bottleneck is (de-)serialization of documents.Since this factor increases linearly and is greatly reduced by the use of Lua, there remains the processing time within each AE.To evaluate the usability of DUUI, we performed a runtime measurement in different scenarios.Processing with DUUI for spaCy was performed for a ramdomized sample of GerParCor with 100, 200, 500, and 1,000 documents using 1, 2, 4, and 8 Docker instances.Figure 8 shows the results: the processing time improves considerably by increasing the number of processes.This is evident in both vari-  ants, local Docker and Swarm, although increasing the number of local Docker instances to more than 8 would not be practical due to hardware limitations in the evaluation setup.Examples 100 and 500 show that a reduction in processing time can be achieved by increasing the number of instances.However, this is not always the case, as depending on the system load, the total runtime can also be higher for multiple processes (e.g.Sample 200 / 1000, Docker8).Although similar behavior occurs in Swarm mode, the overall processing time is higher compared to local Docker because the network-based I/O of the components consumes additional time.In the setup used, no prioritization of the available swarm nodes was done, so the usage of each node is random.Since the individual nodes have different hardware performance and are located in different network segments, this results in lower performance for large documents.The result in Figure 17 (see appendix), using up to 32 instances in swarm mode, shows similar behavior, although here sample size was not randomized, but the n smallest documents were chosen in each case to neglect network latencies.This shows that the number of COMPONENTs in a swarm network can be dynamically increased using DUUI.Next, we compared DUUI with Flyte and CFO (for the results, see Figure 9 and Figure 10) for the comparison with Flyte, documents from the Barbaresi (2018) corpus were preprocessed with spaCy, using XMI as I/O.For both tools a Python script with dkpro-cassis was implemented to (de)serialize the XMI documents.When comparing with CFO, the same corpus but a different library was used to enrich the input documents; this concerns tokenization using NLTK (Bird et al., 2009).The results show the clear advantages of DUUI over Flyte in processing UIMA documents, while it is on par with CFO but increases the versatility of the data flow by embedding a Turing-complete language.Figure 9: Runtime comparison between DUUI and Flyte, for processing German political speeches (Barbaresi, 2018) with 1, 2, 4 and 8 threads respectively.The single-core variant for Flyte was run outside; the variants with 2, 4 and 8 threads were run using the Docker in Docker demo provided by the Flyte sandbox.In DUUI, all processing performed ran in a spaCy-COMPONENT within the DOCKER DRIVER.

Future Work
Since there are container solutions besides Docker (e.g., OpenShift, Kubernetes and Podman), they are candidates for DUUI drivers.This would solve a problem with Docker, as GPU usage is limited: an issue that has been pending for over six years.
3 // load content into jc ...     Figure 17: Unlike Figure 8, a sample with the same number but very small documents was selected.

Figure 1 :
Figure 1: The modular architecture of DUUI

Figure 2 :
Figure2: An example DUUI pipeline which, like all DUUI pipelines, is part of a Java application: a composer is instantiated containing a set of drivers and associated components.Several pipelines are registered as components in corresponding drivers.Each driver defines the communication between the implementations of the components.UIMA and Remote are executed in Java Runtime Environment ( ).Components can also be run as Docker containers, that is, as DOCKER DRIVER or SWARM DRIVER.In the first driver, services are used as Docker images running on the local system as Docker containers to provide vertical scaling.The second driver also uses Docker images, but is distributed in a Swarm network of attached Docker nodes to provide horizontal scaling.For security reasons, using Docker Swarm is only possible if the pipeline is initially executed on the Docker leader node.

Figure 4 :
Figure 4: Java (de-)serialization speed in seconds plotted against the total number of document annotations.

Figure 5 :
Figure 5: Java and Python (de-)serialization speed in seconds plotted against the total document annotations.

Figure 6 :
Figure6: Document selection by quantile number: e.g., 0.8 refers to all documents that make up the top 20% of the largest documents.The frameworks are compared by their mean (de-)serialization time on each quantile.

Figure 11 :Figure 12 :Figure 13 :
Figure 11: Research interest in UIMA in the previous years based on the number of papers including UIMA in the full text of their paper (Dimension.AI).

Figure 14 :
Figure 14: A lightweight implementation of a example pipeline with DUUI based on Docker images as well as on XMI writer.
. DOCKER DRIVER: The DUUI core DRIVER runs COMPONENTs on the local Docker daemon and enables machine-specific resource management (vertical scaling, Feature A).This requires that the AEs are available as Docker images according to DUUI to run as Docker containers.It is not relevant whether the Docker image is stored locally or in a remote registry, since the Docker container is built on startup.This makes it very easy to test new AEs (as local containers) before being released.The distinction between local and remote Docker images is achieved by the URI of the Docker image used.3.TheSWARM DRIVER complements the DOCKER DRIVER; it uses the same functionalities, but its AEs are used as Docker images distributed within the Docker Swarm network (horizontal scaling, Feature A).A swarm consists of n nodes and is controlled by a leader node within the Docker framework.However, if an application using DUUI is executed on a Docker leader node, the individual AEs can be executed on multiple swarm nodes.4. REMOTE DRIVER: AEs that are not available as containers and whose models can or should not be shared can still be used if they are available via REST.Since DUUI communicates via Restful, remote endpoints can be used for pre-processing.In general, AEs implemented based on DUUI can be accessed and used via REST, but the scaling is limited regarding request and processing capabilities of the hosting system.In addition, COMPO-NENTs addressed via the REMOTE DRIVER can be used as services.This has advantages for AEs that need to hold large models in memory and thus require a long startup time.

Table 2 :
(De-)serialization performance of the Java XMI and the Python dkpro-cassis implementation.