InVeRo-XL: Making Cross-Lingual Semantic Role Labeling Accessible with Intelligible Verbs and Roles

Notwithstanding the growing interest in cross-lingual techniques for Natural Language Processing, there has been a surprisingly small number of efforts aimed at the development of easy-to-use tools for cross-lingual Semantic Role Labeling. In this paper, we fill this gap and present InVeRo-XL, an off-the-shelf state-of-the-art system capable of annotating text with predicate sense and semantic role labels from 7 predicate-argument structure inventories in more than 40 languages. We hope that our system – with its easy-to-use RESTful API and Web interface – will become a valuable tool for the research community, encouraging the integration of sentence-level semantics into cross-lingual downstream tasks. InVeRo-XL is available online at http://nlp.uniroma1.it/invero.


Introduction
Informally, Semantic Role Labeling (SRL) is often defined as the task of automatically answering the question "Who did What, to Whom, Where, When, and How?" (Màrquez et al., 2008). More precisely, SRL aims at recovering the predicateargument structures within a sentence, providing an explicit overlay that uncovers the underlying semantics of text. For this reason, SRL is thought to be key in enabling Natural Language Understanding (Navigli, 2018). Today SRL is still an open problem, with several research papers being published each year at top-tier conferences, revealing novel insights and proposing better approaches. Over the years, thanks to this active development, SRL has been successfully exploited in a wide array of downstream tasks that span across different areas of Artificial Intelligence, from Natural Language Processing (NLP) with Information Retrieval (Christensen et al., 2010), Question Answering (He et al., 2015), Machine Translation (Marcheggiani et al., 2018) and Semantic Parsing (Banarescu et al., 2013), to Computer Vision with Visual Semantic Role Labeling (Gupta and Malik, 2015) and Situation Recognition (Yatskar et al., 2016).
Recently, the growing interest in cross-lingual NLP, supported by the increasingly wide availability of pretrained multilingual language models such as BERT (Devlin et al., 2019) and XLM-RoBERTa (Conneau et al., 2020), has sparked renewed interest in multilingual and cross-lingual SRL. In just a few years, researchers have found ways to design fully-neural end-to-end systems for SRL (Cai et al., 2018), to take advantage of contextual word representations (Peters et al., 2018;, to achieve high performance on multiple languages (He et al., 2019a;, to generate sense and role labels with sequence-tosequence models (Blloshmi et al., 2021) and to perform SRL jointly across heterogeneous inventories .
Since SRL is a task that involves complex linguistic theories, inventories and techniques, there have been efforts to develop easy-to-use tools that offer automatic predicate sense and semantic role annotations to users interested in the integration of sentence-level semantics into downstream tasks. Some notable examples include SENNA 1 (Collobert et al., 2011), which uses an ensemble of feature-based classifiers (Koomen et al., 2005), Al-lenNLP's SRL demo 2 , which provides a reimplementation of a BERT-based model (Shi and Lin, 2019), and InVeRo , which offers annotations according to two different linguistic inventories, PropBank (Palmer et al., 2005) and VerbAtlas (Di Fabio et al., 2019). However, one important drawback of the above-mentioned tools is that they are able to perform SRL only in English, which hinders the exploitation of their annotations in multilingual and cross-lingual NLP.
In order to fill this gap, we build upon InVeRo and propose its next major release, InVeRo-XL, with the objective of making SRL accessible in multiple languages. We rebuild InVeRo-XL from the ground up to offer: • The first end-to-end system to tackle the whole SRL pipeline in over 40 languages; • The first off-the-shelf system to provide SRL annotations for 7 linguistic inventories; • A RESTful API service that can be queried either online, so as not to install any software, or offline, to maximize throughput; • A Web interface that provides a visualization of the system output which can be useful for teaching purposes, comparing linguistic theories, and prototyping new ideas.
We believe that InVeRo-XL can provide a stepping stone for the integration of explicit sentence-level semantics into cross-lingual tasks, attracting new researchers to the field of SRL and its applications. 3

What's New in InVeRo-XL
As previously mentioned, InVeRo-XL is the successor of InVeRo. Although its main new feature is the ability to provide predicate sense and semantic role annotations in over 40 languages with 7 different inventories, InVeRo-XL has been overhauled to also improve several other important aspects. In particular: • Preprocessing: while its predecessor used a very limited set of rules to preprocess English text, InVeRo-XL features a multilingual preprocessing module based on spaCy and Stanza; • SRL model: the English-only model has been replaced by a cross-lingual model that is able to perform not only span-based SRL, but also dependency-based SRL; 3 InVeRo-XL can be downloaded upon request at http:// nlp.uniroma1.it/resources. InVeRo-XL is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
• API: the service is now able to handle batched requests and documents of arbitrary length; • Offline usage: InVeRo-XL is now available for download, free for research purposes, allowing users to host their own instance locally.

System Overview
In this Section, we provide an overview of the main components of InVeRo-XL and how they interact, describing in detail the preprocessing module (Section 2.1) and the SRL model (Section 2.2).

Preprocessing
The previous version of InVeRo-XL preprocessed an English sentence using a very limited and simple set of rules. In order to correctly support more languages, InVeRo-XL now relies on both spaCy (Honnibal et al., 2020) and Stanza (Qi et al., 2020) to deal transparently with document splitting and tokenization. An automatic language detector based on fastText 4 (Joulin et al., 2017) is used to dynamically choose between the two preprocessing tools, depending on the language detected: spaCy is faster for high-resource languages, e.g. English, but also less reliable on lower-resource languages, e.g. Catalan, for which our system falls back to Stanza.

Model Architecture
In line with its predecessor, InVeRo-XL encapsulates an SRL model that falls within the broad category of end-to-end systems, tackling the whole SRL pipeline -predicate identification, predicate sense disambiguation, argument identification and argument classification -in a single forward pass. However, the design of the SRL model itself has been completely revamped and now follows the architecture recently proposed by , which is capable of performing cross-lingual SRL with heterogeneous linguistic inventories. In the following, we describe the main components of the SRL model architecture provided by InVeRo-XL.
Multilingual word encoder. The first component of our model is a multilingual word encoder that takes advantage of a pretrained language model, XLM-RoBERTa (Conneau et al., 2020), to provide rich contextualized word representations.
More formally, for each word w i in an input sentence w = w 1 , w 2 , . . . , w n of length n, it computes an encoding e i = Swish(W w h i + b w ) as a non-linear projection of the concatenation h i of the corresponding hidden states of the four topmost layers of the language model.
Universal word encoder. The resulting sequence of multilingual word encodings E = e 1 , e 2 , . . . , e n is then given to a "universal" word encoder that computes a sequence of task-specific timestep encodings T = t 1 , t 2 , . . . , t n as follows: is the i-th timestep of the j-th BiLSTM layer and K is the total number of BiL-STM layers. The purpose of this encoder is to create representations that are shared across languages and inventories and are, therefore, "universal".
Universal predicate-argument encoder. Similarly to the encoder above, the objective of the universal predicate-argument encoder is to build predicate-specific argument representations that lie in a vector space shared across languages and inventories. Assuming that w p is a predicate in the input sentence w, this encoder builds a sequence A of predicate-specific argument encodings as follows: where t i is the i-th timestep encoding from the universal sentence encoder, a j i is the argument encoding for w i with respect to w p produced after the j-th BiLSTM layer, and K is the total number of BiLSTM layers.
Inventory-specific decoders. Finally, the universal encodings are given to a set of classifiers in order to obtain the desired output labels. More specifically, for each inventory, we need three types of output: i) whether a word w i is a predicate w p ; ii) the most appropriate sense s for a predicate w p ; iii) which semantic role r, possibly the null role, exists between a word w i and a predicate w p . More formally, our model features three classifiers for each inventory I as follows: where each σ · (·) provides a score distribution over the possible output classes, i.e. two (true or false) for predicate identification, the number of senses of an inventory for predicate sense disambiguation, and the number of semantic roles (including the null role) of an inventory for argument labeling.
Miscellanea. While we follow the architecture proposed by , the SRL model of InVeRo-XL also comes with a small but significant number of enhancements. One such enhancement is that, while  propose a model for dependency-based SRL, our model is also able to perform span-based SRL by treating spans as sequences of BIO tags. In order to correctly decode valid spans at inference time, InVeRo-XL makes use of a Viterbi decoder. Other improvements include training the model with the RAdam optimizer (Liu et al., 2020), ensuring that each training batch features a balanced number of instances for each language in the training set, and searching randomly for better hyperparameter values.

Evaluation
Datasets. We report the performance of InVeRo-XL on two gold standard benchmarks for SRL: CoNLL-2009(Hajič et al., 2009) for dependencybased SRL and CoNLL-2012(Pradhan et al., 2012 for span-based SRL. To the best of our knowledge, CoNLL-2009 is the largest benchmark for multilingual SRL as it comprises six languages, namely, Catalan, Chinese, Czech, English, German and Spanish. 5 The main challenge of this benchmark is that each language was annotated with a different predicate-argument structure inventory, e.g. the English PropBank (Palmer et al., 2005) for English, AnCora (Taulé et al., 2008) for Spanish/Catalan and PDT-Vallex (Hajic et al., 2003) for Czech. While CoNLL-2009 is an ideal test bed for evaluating the multilingual capabilities of an SRL system, dependency-based annotations may look unfamiliar to end users who are not used to the notion of syntactic/semantic heads. Therefore, differently from , we also adapt the system to perform span-based SRL and evaluate its effectiveness on the standard English datasets of CoNLL-2012 and on CoNLL-2009, converting dependency-based annotations to span-based annotations. We convert an argument head to an argument span by considering all those words that fall in the syntactic subtree whose root is the argument head and discarding all those predicates for which this conversion produces overlapping spans.

The InVeRo-XL API
In order to facilitate the integration of predicateargument structure information into downstream tasks, InVeRo-XL exposes its fully self-contained end-to-end multilingual SRL pipeline through an easy-to-use RESTful API. In the following, we provide an overview of the main functionalities of the InVeRo-XL API, from its Resource API (Section 3.1) to its Model API (Section 3.2) and how to host InVeRo-XL locally on a user's own hardware (Section 3.3). We refer users to the online documentation for the complete list of supported languages and inventories, together with other details. 6

Resource API
The Resource API is a simple way for obtaining semantic information about predicates using the intelligible verb senses and semantic roles defined by VerbAtlas, a large-scale predicate-argument structure inventory which clusters WordNet synsets (Miller, 1992) that share similar semantic behavior. The Resource API of InVeRo-XL builds upon the functionalities provided by its predecessor with the key difference that it now supports multiple languages thanks to BabelNet 5.0 7 (Navigli and Ponzetto, 2012;, a multilingual encyclopedic dictionary that provides unified access to several knowledge bases including WordNet.
More specifically, the Resource API defines two endpoints: • /api/verbatlas/predicate: given a predicate p, this endpoint retrieves the set of VerbAtlas frames which include at least one sense of p.
• /api/verbatlas/frame: given a Ver-bAtlas frame f , this endpoint retrieves its predicate-argument structure, i.e., the semantic roles, and the WordNet/BabelNet synsets that belong to f .

Model API
The Model API of InVeRo-XL has been updated to not only take advantage of the new multilingual SRL system but also to provide quality-of-life improvements. The Model API now accepts requests in over 40 languages and returns semantic annotations according to 7 linguistic inventories. On top of this, the Model API is now able to process documents of arbitrary length and to handle batches of documents in a single request. More specifically, the Model API exposes an endpoint named /api/model. This endpoint accepts POST requests with a JSON body containing a list of input objects, one for each document the user wishes to annotate. Each input object shall specify the following fields: • text: a mandatory field that contains the text of the document. • lang: an optional field that indicates the language of the document. If omitted, InVeRo-XL will use an automatic language detector (see Section 2.1).
Each request to the Model API returns a JSON response containing a list of output objects, one for each input document, containing the automatic annotations according to each of the 7 linguistic inventories, as shown in Figure 1. Users can search for predicate information (e.g. the VerbAtlas frames a verb belongs to) and tag sentences in multiple languages with different linguistic inventories (see Figure 3).

Offline Usage
One of the most requested features that is currently missing from InVeRo is the possibility of running an offline instance of the service so as to annotate large quantities of text in a shorter time, independently of the latency of the network and the volume of requests being processed by our Web server. To address this issue, InVeRo-XL is also distributed as a Docker 8 image that can be deployed locally on a user's own hardware. 9 While network latency is often a bottleneck for processing a request, an offline instance of InVeRo-XL does not suffer from such a constraint and can therefore benefit greatly from running on better hardware, e.g. on GPU. We distribute InVeRo-XL in two configurations: • invero-xl-span is the configuration that performs multilingual span-based SRL and is the one used by InVeRo-XL's Web server; • invero-xl-dependency is an alternative configuration built to perform multilingual dependency-based SRL.
Running a local instance of InVeRo-XL is also simple. First, users are required to perform a onetime setup to load one of the available images:

Web Interface
Similarly to its predecessor, InVeRo-XL includes a public-facing Web interface (Figures 2 and 3) that provides a visual environment for both the Resource API and the Model API, allowing users to explore the main functionalities while also providing an intuitive overview of how an SRL system annotates a sentence or a short document. Most importantly, the Web interface of InVeRo-XL has been updated to reflect the changes in the Model API and the underlying SRL model; now users can annotate text in 12 languages 11 and visualize predicate senses and semantic roles in 7 linguistic inventories on the fly, without having to write code. Figure 3 shows an example sentence in Italian with its corresponding predicate senses and seman-10 http://nlp.uniroma1.it/invero/api-documentation 11 We limit the number of languages available on the Web interface due to hardware constraints as Stanza and spaCy use one preprocessing model for each language. Figure 3: The beginning of the Divine Comedy by Dante Alighieri in Italian as tagged by InVeRo-XL with three predicate-argument structure inventories -VerbAtlas, the Chinese PropBank and the Spanish AnCora. "Nel mezzo del cammin di nostra vista mi ritrovai per una selva oscura, ché la diritta via era smarrita" translates into "Midway upon the journey of our life I found myself within a forest dark, for the straightforward pathway had been lost". tic roles as provided by InVeRo-XL. Thanks to a dropdown menu, users can immediately switch from the labels of one inventory to those of another, independently of the input language, without reloading the Web page. We argue that this Web interface should help teachers explain SRL to their students, allow linguists to compare linguistic inventories on particular case studies, attract new researchers to the field, and inspire others to exploit SRL in downstream tasks or even real-world scenarios.

Conclusion and Future Work
Over the years, the research community has greatly advanced the field of SRL, proposing ever more complex approaches to tackle the task more effectively. However, despite the growing interest in cross-lingual NLP, there have been very few efforts to develop automatic tools to perform SRL in multiple languages. Our objective with InVeRo-XL is to fill this gap and equip researchers with an easyto-use, high-performing system capable of providing predicate sense and semantic role annotations in over 40 languages with 7 linguistic inventories. Users can take advantage of our state-of-the-art sys-tem for cross-lingual SRL through a RESTful API that relieves them from the need to reimplement complex neural models and/or to build an efficient preprocessing/postprocessing pipeline.
Although InVeRo-XL is a major step forward compared to its predecessor, we intend to further improve our system by adopting future and more advanced SRL model architectures and by including new training datasets, such as UniteD-SRL . We strongly believe that InVeRo-XL will facilitate the integration of SRL into downstream cross-lingual tasks, hopefully aiding further advancements in cross-lingual Natural Language Understanding.