Using and comparing Rhetorical Structure Theory parsers with rst-workbench

I present rst-workbench, a software package that simplifies the installation and usage of numerous end-to-end Rhetorical Structure Theory (RST) parsers. The tool offers a web-based interface that allows users to enter text and let multiple RST parsers generate analyses concurrently. The resulting RST trees can be compared visually, manually post-edited (in the browser) and stored for later usage.


Introduction
Rhetorical Structure Theory (RST) provides a formalism for hierarchical text organization that can be applied to a wide range of natural language processing tasks, ranging from text generation (Marcu, 1997;Konstas and Lapata, 2013) to the assessment of conversational patterns of Alzheimer's patients (Abdalla et al., 2018;Paulino et al., 2018).
Most research on RST parsing is focused on parser engineering, i.e. the evaluation of parsers against a "gold standard" hand-annotated dataset. Although RST corpora exist for a variety of other languages (e.g. German, Dutch and Spanish), endto-end discourse parsers are usually only trained and evaluated on English data, with the notable exception of Braud et al. (2017a,b,c).
There are a number of RST parsers that were developed for other languages, but either are they not publicly available (e.g. Reitter (2003) for German and English as well as Pardo and Nunes (2008) for Portuguese) or they do not produce complete RST analyses, e.g. Sumita et al. (1992) for Japanese (no intra-sentence relations) and da Cunha et al. (2012) for Spanish (no inter-sentence relations).
1 The rst-workbench and all related Docker configuration files, images and REST API wrappers around the RST parsers are available from https://github.com/arne-cl/ rst-workbench. An online demo is provided at https: //rst-workbench.arne.cl/.
Compared to other NLP tasks like syntax parsing, the amount of available training data is limited, with corpora being in the range from dozens to a few hundred hand-annotated texts. There is also little work evaluating RST parsers beyond the Parseval-based procedure proposed by Marcu (2000). 2 For example, machine learning models are usually not compared with respect to their ability to detect rare rhetorical relations. Zhang and Liu (2016) found that rhetorical relations in RST-DT at different levels-i.e. between clauses within sentences, between sentences within paragraphs and between paragraphs-all follow the same Zipf's law-related distribution. Individual relations show different patterns, e.g. Attribution is more common in intrasentential relations than on higher levels.
In addition, no systematic review exists of the impact of the preprocessing steps-sentence splitting, syntax parsing and segmentation into Elementary Discourse Units (EDUs)-on the quality of the resulting RST parses. There is some work in this direction, though.
For example, Surdeanu et al. (2015) implemented two RST parsers that only differ in the syntax parser used-the constituent-based RST parser produced slightly better results, but the dependency-based equivalent was 2.5 times faster. Braud et al. (2017c) evaluated the influence of syntactic information (using either constituency parses, dependency parses or only POS tags) on discourse segmentation. Rutherford et al. (2017) reviewed the impact of different neural network architectures on implicit discourse relation detection.
While Huber and Carenini (2020) showed that RST parser performance can be improved by train-ing them on large RST treebanks automatically generated using distant supervision, I hypothesize that RST parsers can profit even more from larger human-annotated training corpora.
In turn, the annotation of RST corpora can likely be sped up by leveraging RST parsers. In the same vein that translators can produce high-quality translations by post-editing machine-translated texts more quickly than by manual translation alone (Gaspari et al., 2014;Koponen, 2016), I assume that linguists can produce RST analyses faster with machine support (i.e. by selecting the best automatic analysis from a number of RST parsers and then post-editing it) than by relying on handannotation alone.
If the goal is to make annotators use RST parsers productively, the parsers need to be adapted to meet their needs. While the primary focus of RST parser development is improving upon state-of-theart benchmark results, this work focuses on usability and compatibility, i.e. the parsers need to be easy to install and run while supporting the same format(s) that common RST annotation and visualization tools use.
To achieve this, I implemented rst-workbench, which: • acts as a web-based front-end to six different RST parsers, • provides an easy way to install the parsers on all modern desktop operating systems using Docker containers, • facilitates their integration into NLP pipelines by wrapping them in REST APIs, • enables the RST analyses produced by the parsers to be visualized by and edited in the rstWeb annotation tool (Zeldes, 2016) by amending it with a REST API and by providing converters from the parsers' output formats to rstWeb's input format. The remainder of this paper is organized as follows. Section 2 gives a brief overview of related work, while Section 3 describes the architecture and usage of the system. Section 4 summarizes the main conclusions and outlines areas of future work.

Related work
To the best of my knowledge, rst-workbench is the first tool that offers a graphical user interface for and integrates several RST parsers. Besides rst-Web (Zeldes, 2016), which is integrated in this soft- ware, there are two other annotation tools specifically made for rhetorical structures: RSTTool (O'Donnell, 2000) and TreeAnnotator (Helfrich et al., 2018). For visualizing RST trees and querying RST corpora, there is ANNIS3 (Krause and Zeldes, 2014).
While I implemented very minimal REST APIs around the individual RST parsers in Python, CLAM (van Gompel and Reynaert, 2014) could be used to create REST API wrappers around command-line NLP tools by writing a configuration file. For simple cases, it is slightly more complicated to setup than my approach (cf. Section 3.2), but it comes with many additional features (e.g. user authentication, batch processing and a generic web interface for each API) and can be extended with additional format converters and visualization components.

Software architecture and usage
The rst-workbench provides a simple way to install multiple RST parsers on a computer, run them as well as visually compare and edit their analyses in a web browser. Its architecture and usage is summarized in Figure 1. Screenshots of the workflow are provided in Figures 2 and 3.
Users do not have to learn the different  command-line interfaces of the parsers, but can simply interact with them via a web browser. To make this possible, I added a REST API to each of the parsers, which the browser can talk to (cf. Section 3.2). In the browser, annotators can enter text or upload a plain text document. After clicking the "Run Parsers" button, all RST parsers are started concurrently to analyze the given text. The results appear asynchronously in the browser, i.e. the user sees the result of the fastest parser immediately when it is available and does not have to wait for the remaining parsers to finish processing. Users can now select the parse tree that most resembles their linguistic intuition, and click "Edit in rstWeb" to load the analysis into the rstWeb annotation tool. Here, all aspects of the RST tree can be modified, e.g. rhetorical relations between EDUs and/or larger subtrees can be replaced (Figure 3). Afterwards, the result can be saved locally for further inspection or corpus creation.
The technical setup needed to integrate all these stand-alone tools into one software package with a unified interface is described in the following subsections.

Docker
Docker is a tool that allows programmers to bundle a piece of software with all its dependencies into a container, which a user can reproducibly install on any computer with Linux, Mac OSX or Windows without having to know any details about the software. The step-by-step installation process of a software package is described in a so-called Dockerfile, which is both readable by machines and humans.
Installing an RST parser from a Dockerfile will save the user the effort of finding its dependencies, installation parameters and training settings. This will not, however, reduce the run time of the installation and training process, as that will happen on the user's local machine. This process can be drastically sped up by using a Docker image, which is a compressed file that contains the results of running a Dockerfile. I provide Docker images for all but one of the RST parsers available in the rst-workbench. 3 If Docker is already running on the target system, the installation of an RST parser boils down to a one-line command. 4 At this point, the parsers can be used without tedious installation procedures, but are still only available as individual command-line tools with different parameters and output formats. To improve their usability, I make them available as web services with a common interface (cf. Section 3.2). To improve their comparability, I offer a simple way to convert their output to a common format and generate visualizations of the resulting RST trees (Section 3.3).

Web application and REST APIs
With rst-workbench, I aim to make RST parsers more accessible to a wider audience, by providing a common (graphical) interface for them. I chose to implement this in form of a web application, which talks to the individual RST parsers via REST (Fielding, 2000), a simple text-based protocol commonly used by programs running on different computers to communicate with each other via the Internet. I implemented REST interfaces for each of the parsers using the Python hug library 5 . They receive requests containing the text to be analyzed, run the actual (command-line) RST parsers in the background on the given input, capture their out-3 All Docker images for the rst-workbench are available at https://hub.docker.com/u/nlpbox. I can't provide an image for the HILDA RST parser (Hernault et al., 2010), as its license does not allow its source code to be freely distributed. Nevertheless, if you have access to the HILDA source code, you can simply build an image using the Dockerfile provided at https://github.com/ NLPbox/hilda-docker. 4 For example docker run nlpbox/heilman-sagae-2015 for the (Heilman and Sagae, 2015) parser. 5 http://www.hug.rest/ puts and send them back to the requesting program. Using REST allows the user to run the parsers on different machines than the web application (in case one computer does not have enough RAM or processing power to run all RST parsers at the same time) and even to use the parsers as web services without the front-end, e.g. to integrate them into custom NLP pipelines. It also simplifies the process of adding more parsers to the rst-workbench, as it only needs to know where the parsers run and which output format they use.

discoursegraphs and rstWeb
The interoperability of RST tools is hindered by the lack of a standard format for encoding RST analyses. While corpora are either using the LISPlike dis or the XML-based rs3 format, RST parsers are using a plethora of custom formats. rst-workbench is able to convert many of them into rs3-the format supported by RST annotation tools like RSTTool (O'Donnell, 2000) and rstWeb (Zeldes, 2016)-by utilizing a REST service I implemented on top of the discoursegraphs converter library (Neumann, 2015(Neumann, , 2016, which supports all RST file formats of the given parsers. Using the rs3 format and integrating rstWeb into the rst-workbench allows it to leverage rstWeb's capabilities to visualize and (post)-edit RST trees. 6

Conclusions
In this paper, I presented a software package that simplifies the installation, usage and visual comparison of RST parsers. I showed how it can help linguists to produce manual RST analyses with less effort.
I plan to integrate the rst-workbench directly into rstWeb to facilitate corpus annotation projects. In rstWeb, an "administrator" can create an annotation project, upload documents to be annotated and assign them to annotators. With an integrated rstworkbench, the administrator could precompute and store the automatic RST analyses once for all annotators, which would reduce wait time for the annotators and allow them to work without switching browser tabs and tools.
In the current setup, analyzing a text with all parsers may take up to two minutes. Most of this time is spent on loading the models of the underlying syntax parsers. This can be drastically reduced by "outsourcing" the syntax parsers into their own web services, as I have already done for DPLP. 7