AMuSE-WSD: An All-in-one Multilingual System for Easy Word Sense Disambiguation

Over the past few years, Word Sense Disambiguation (WSD) has received renewed interest: recently proposed systems have shown the remarkable effectiveness of deep learning techniques in this task, especially when aided by modern pretrained language models. Unfortunately, such systems are still not available as ready-to-use end-to-end packages, making it difficult for researchers to take advantage of their performance. The only alternative for a user interested in applying WSD to downstream tasks is to rely on currently available end-to-end WSD systems, which, however, still rely on graph-based heuristics or non-neural machine learning algorithms. In this paper, we fill this gap and propose AMuSE-WSD, the first end-to-end system to offer high-quality sense information in 40 languages through a state-of-the-art neural model for WSD. We hope that AMuSE-WSD will provide a stepping stone for the integration of meaning into real-world applications and encourage further studies in lexical semantics. AMuSE-WSD is available online at http://nlp.uniroma1.it/amuse-wsd.


Introduction
Word Sense Disambiguation (WSD) is the task of associating a word in context with its most appropriate sense from a predefined sense inventory . Learning the meaning of a word in context is often considered to be a fundamental step in enabling machine understanding of text (Navigli, 2018): indeed, a word can be polysemous, that is, it can refer to different meanings depending on the context. Over the past few years, WSD has received growing attention and has been proven to be useful in an increasing range of applications, such as machine translation (Liu et al., 2018;Pu et al., 2018;Raganato et al., 2019), information extraction (Zhong and Ng, 2012;Delli Bovi et al., 2015), information retrieval (Blloshmi et al., 2021), text categorization (Shimura et al., 2019) and question answering (Ramakrishnan et al., 2003).
WSD approaches usually fall into two categories: knowledge-based (Moro et al., 2014;Agirre et al., 2014;Chaplot and Salakhutdinov, 2018), which leverage computational lexicons, and supervised (Bevilacqua and Blevins and Zettlemoyer, 2020;Barba et al., 2021;ElSheikh et al., 2021), which train machine learning models on sense-annotated data. While early work mainly belongs to the former category (Navigli, 2009), recent studies have shown the superiority in performance of the latter category, especially thanks to complex neural networks . However, as WSD systems become more and more reliant on increasingly complex input representations -pretrained language models are becoming the de facto representation method in several NLP tasks -and involve neural architectures, the entry requirements for end users have also become higher. Indeed, the implementation of state-of-the-art WSD systems is frequently anything but straightforward. What is more, such systems are often not ready to be used off-the-shelf and require additional preprocessing and postprocessing modules for document splitting, tokenization, lemmatization and part-of-speech tagging. These complications may make the use of recent high-performing WSD systems unattractive, or even out of reach, for researchers who want to take advantage of explicit semantic information in other areas of research, but who are not experts in WSD.
In order to make WSD more accessible, there have been several attempts at providing ready-touse WSD systems that can be easily integrated into other systems (Navigli and Ponzetto, 2012b;Moro et al., 2014;Agirre et al., 2014;Scozzafava et al., 2020). Nevertheless, current ready-to-use WSD systems are either English-only or based on approaches that now lag behind state-of-the-art mod-els in terms of performance.
In this paper, we fill the gap and present AMuSE-WSD, an easy-to-use, off-the-shelf WSD package that provides sense annotations in multiple languages through a state-of-the-art neural-based model. The main features of AMuSE-WSD can be summarized as follows: • We propose the first ready-to-use WSD package to offer a multilingual WSD system built on top of modern pretrained language models; • AMuSE-WSD offers an easy-to-use REST API that can be queried either online for ease of use, or offline to minimize inference times; • Our system comes with a Web interface to let users disambiguate short documents on the fly without a single line of code; • We support 40 languages offline and 10 languages online.
We hope that AMuSE-WSD will facilitate the integration of semantic information into tasks that may benefit from high-quality sense information, enabling the exploitation of WSD in multiple languages. 1

System Description
AMuSE-WSD is the first all-in-one multilingual system for WSD based on a state-of-the-art neural model. Our system encapsulates this model in a pipeline which provides word-level semantic information in an end-to-end fashion so that a user is not required to provide anything more than raw text as input. In this Section, we provide a description of the various components of our model, focusing on its preprocessing pipeline (Section 2.1) and our WSD system (Section 2.2), which is the core of AMuSE-WSD. Furthermore, we provide an overview of its performance on several standard benchmarks for WSD (Section 2.3).

Preprocessing
Preprocessing is one important aspect that is often overlooked by state-of-the-art models built for research purposes, as they often rely on pre-parsed documents that are already split into sentences, tokenized, lemmatized and PoS-tagged. However, a good preprocessing pipeline is a necessary condition for a high-quality WSD system. Indeed, the most popular sense inventories for WSD, e.g. WordNet (Miller, 1992) and BabelNet (Navigli and Ponzetto, 2012a;, define the possible meanings of a word with respect to its lemma and PoS tag. Therefore, an accurate preprocessing pipeline is fundamental in order to generate the correct candidate set of possible meanings. AMuSE-WSD's preprocessing pipeline takes advantage of two popular toolkits, spaCy (Honnibal et al., 2020) and Stanza (Qi et al., 2020), to provide high-quality document splitting, tokenization, lemmatization and PoS tags to the core WSD model. We provide more technical details about the features of our preprocessing pipeline in Section 3.

Model Architecture
The core of AMuSE-WSD is its WSD model. Since the main objective of our system is to provide the best possible automatic annotations for WSD, our package features a reimplementation of the stateof-the-art WSD model proposed recently by . Differently from other ready-to-use WSD packages which are based on graph-based heuristics (Moro et al., 2014;Scozzafava et al., 2020) or non-neural models (Papandrea et al., 2017), this neural architecture is built on top of a Transformer encoder (Vaswani et al., 2017). More specifically, given a word w in context, the WSD model i) builds a contextualized representation e w ∈ R d L of the word w as the average of the hidden states of the last four layers of a pretrained Transformer encoder L, ii) applies a non-linear transformation to obtain a sense-specific hidden representation h w ∈ R d h , and finally iii) computes the output score distribution o w ∈ R |S| over all the possible senses of a sense inventory S. More formally: where l −k w is the hidden state of the k-th layer of L from its topmost layer, BatchNorm(·) is the batch normalization operation, and Swish(x) = x · sigmoid(x) is the Swish activation function (Ramachandran et al., 2018).
We follow  in framing WSD as a multi-label classification problem in   . We distinguish between WSD Modules, that is, research systems that need to be inserted into a pipeline, and End-to-End WSD Systems. Best results among end-to-end systems in bold.
which the model can learn to assign multiple valid senses to each target word. Indeed, the "boundaries" between different senses of a polysemous word are not always clear cut or well defined (Erk and McCarthy, 2009), often leading to cases in which, given a word in context, more than one meaning is deemed appropriate by human annotators. Framing WSD as a multi-label classification problem allows the model to take advantage of such cases. In particular, this means that the model is trained to predict whether a sense s ∈ S w is appropriate for a word w in a given context, independently of the other senses in S w .
Finally, we evaluate AMuSE-WSD on XL-WSD , a new multilingual dataset which comprises 17 languages.
Experimental setup. We evaluate how the performance of AMuSE-WSD varies when using four different pretrained language models to represent input sentences: a high-performing Englishonly version based on BERT-large-cased (Devlin et al., 2019), a high-performing multilingual version based on XLM-RoBERTa-large (Conneau et al., 2020), a multilingual version that relies on the smaller XLM-RoBERTa-base to balance quality and inference time, and a multilingual version based on Multilingual-MiniLM (Wang et al., 2020) that minimizes inference time while still providing good results. In any case, the weights of the underlying language model are left frozen, that is, they are not updated during training. Each model is trained for 25 epochs using Adam (Kingma and Ba, 2015) with a learning rate of 10 −4 . We train each model configuration on the union of SemCor (Miller et al., 1994) and the WordNet Gloss Corpus. Following standard practice, we perform model selection choosing the checkpoint with highest F1 score on SemEval-2007, the smallest evaluation dataset.
Results. Table 1 shows how AMuSE-WSD performs in comparison to currently available endto-end WSD systems, that is, systems that only require raw text in input. AMuSE-WSD (XLMRlarge) offers a very significant improvement over SyntagRank (Scozzafava et al., 2020), the previous best end-to-end system, both in English WSD (+7.6 in F 1 score on the concatenation of all the English test sets) and multilingual WSD (+6.8, +6.8 and +9.6 in F 1 score on SemEval-2013, SemEval-2015 and XL-WSD, respectively). It is worth noting that even the lightest -and therefore faster -configuration of AMuSE-WSD (m-MiniLM) still shows significant improvements over SyntagRank (+3.4 and +6.2 in F 1 score on ALL and XL-WSD, respectively). For completeness, Table 1 also reports the performance of recently proposed, non end-to-end modules for WSD.

AMuSE-WSD API
AMuSE-WSD can easily be used to integrate sense information into downstream tasks. Indeed, AMuSE-WSD is a simple solution for out-of-thebox sense disambiguation as it offers a RESTful API that provides access to a full end-to-end stateof-the-art pretrained model for multilingual WSD.
The main advantage of our system is that it is fully self-contained as it takes care of the preprocessing information usually needed by a WSD system, namely, document splitting, tokenization, lemmatization and part-of-speech tagging (see Section 2.1). This means that AMuSE-WSD takes just raw text as input, freeing the user from the burden of having to compose complex preprocessing pipelines. Furthermore, our system is optimized to speed up inference, especially on CPU, which makes AMuSE-WSD accessible to a broader audience with smaller hardware budgets. Finally, the API is available both as a Web API, so as to obtain sense annotations without installing any software, and offline API, so that a user can host AMuSE-WSD locally and minimize inference times. In the following, we provide more details on the main set of functionalities offered by AMuSE-WSD.
Document-level preprocessing. The AMuSE-WSD API lets the user obtain disambiguated text for batches of sentences in a single query to the service, reducing the load on the network and the latency of the responses. On top of this, one common use case for end users is the disambiguation of long texts. In order to assist users in such cases, AMuSE-WSD supports the disambiguation of documents of arbitrary length, performing document splitting to enable faster inference by transparently batching text segments.
Sentence-level preprocessing. The only piece of information the end user needs to provide to AMuSE-WSD is raw text. Nonetheless, the underlying model needs lemmas and part-of-speech tags in order to select the proper sense for a word (see Section 2.1). To this end, AMuSE-WSD integrates a preprocessing pipeline, transparent to the user, that performs tokenization, lemmatization and part-of-speech tagging. In the search for the optimal compromise between latency and accuracy of the preprocessing, AMuSE-WSD employs two different preprocessing pipelines depending on the language of the input document, built around spaCy and Stanza. For each of the 40 supported languages, our system selects one of the preprocessing models available in either spaCy or Stanza, prioritizing the former for its high inference speed and falling back to Stanza for lower-resource languages where the performance of the former is suboptimal.
Usage. The AMuSE-WSD API exposes an endpoint named /api/model. The endpoint accepts POST requests with a JSON body, containing a list of documents. For each document, two parameters must be specified: • text: the text of the document.
• lang: the language of the document.

] }]
We refer to the online documentation for more details, including the full list of supported languages. 2

Offline API
To further promote the integration of WSD into large-scale applications, AMuSE-WSD is also distributed as a Docker 3 image that can be deployed locally on the user's own hardware. We provide several ready-to-use images suitable for different configurations and use cases, depending on whether a user is constrained by hardware requirements or prefers the highest quality over shorter inference times. The images are differentiated based on their size in parameters and on the hardware they are optimized for, that is, CPU or GPU. In the following, we provide more details on the Docker images.
Model configurations. With AMuSE-WSD, an end user can choose between four different types of pre-built Docker images which differ in the pretrained language model used to compute contextualized word representations: • amuse-large uses BERT-large and provides state-of-the-art results in English WSD.
• amuse-large-multilingual employs XLM-RoBERTa-large and thus offers the best results in multilingual WSD. However, it is also the most demanding in terms of hardware requirements.
• amuse-medium-multilingual adopts XLM-RoBERTa-base which provides outputs that in 98% of cases are the same as those of its larger counterpart, but taking half the time.
• amuse-small-multilingual uses the multilingual version of MiniLM, a language model distilled from XLM-RoBERTa-base. It is three times faster and three times smaller, while still achieving remarkable results.
Inference times. In order to further promote its accessibility, AMuSE-WSD is also available in Docker images optimized to run on CPU thanks to ONNX. The ONNX Runtime 4 is an engine built with the aim of significantly improving inference times while, at the same time, reducing system footprint. With ONNX, our system is up to 2.5 times 10.7× 90 6.7× Table 2: Inference time for several configurations over two sets of documents: the concatenation of all the test sets of the unified evaluation framework for WSD (ALL), and a set of 300 documents whose lengths are between 1500 and 3500 characters (Long). ∆: latency or median time in milliseconds AMuSE-WSD takes to serve one inference request. ↑: Inference speed-up provided by the ONNX and CUDA implementations over vanilla PyTorch.
faster on CPU while also being 1.2 times smaller compared to its vanilla PyTorch implementation. We evaluate the inference speed of AMuSE-WSD on two sets of documents: the concatenation of all the sentences of the unified evaluation framework for WSD by  and a set of 300 documents whose lengths range from 1500 to 3500 characters. Table 2 compares the inference times of our system in several configurations, showing the gains of ONNX on CPU.

Web Interface
AMuSE-WSD comes with a Web interface which allows users to disambiguate text on the fly without having to write a single line of code. Figure 1 shows the home page of AMuSE-WSD in which a user can type a short text (up to 3000 characters) and indicate the language of the inserted text. Figure 2 shows, instead, how the Web interface displays the senses for the input sentence "the quick brown fox jumps over the lazy dog", while Figures  3 and 4 show a correctly disambiguated word, that is, bank, which is ambiguous both in English and Italian. For each content word (nouns, adjectives, adverbs and verbs), the interface shows the corresponding sense predicted by the underlying WSD model, including its definition from WordNet and an image, if available. Each meaning is linked to its corresponding Web page in BabelNet 5, 5 a multilingual encyclopedic dictionary that includes WordNet, Open Multilingual WordNet, Wikipedia, Wikidata and Wiktionary, inter alia. Users can click on a meaning to obtain further information about it, ranging from other definitions to its translation in other languages, from its hypernyms (generalizations) to its hyponyms (specializations). We believe that this interface will be especially useful for researchers, teachers, students and curious people who may be interested in understanding how WSD can be taken advantage of in other fields. Moreover, we also believe that an easy-to-use interface will attract the attention of more researchers on the task of WSD itself, encouraging future developments in this area.

Conclusion
Over the past few years, WSD has witnessed an increasing rate of development, especially thanks to the renewed interest in neural networks and the advent of modern pretrained language models. However, research implementations are often 5 https://babelnet.org far from being ready-to-use as they either take as input pre-parsed documents or require setting up a preprocessing pipeline to take care, at least, of document splitting, tokenization, lemmatization and PoS tagging. Unfortunately, currently available ready-to-use WSD systems now lag behind the state of the art and offer solutions based on non-neural approaches.
In this paper, we addressed this issue and presented AMuSE-WSD, an All-in-one Multilingual System for Easy Word Sense Disambiguation. To the best of our knowledge, AMuSE-WSD is the first system for end-to-end WSD to encapsulate a state-of-the-art neural model for sense disambiguation in multiple languages. Our system makes it easy to obtain and use word-level semantic information about word meanings through a RESTful API. This API is available both online, that is, a user can disambiguate text through HTTP requests to our server, and offline, that is, a user can download prepackaged Docker images and run them locally to minimize inference times on large bulks of documents. We provide different configurations to satisfy different needs, from images that are optimized to run on constrained hardware to high-performing   images. Last but not least, AMuSE-WSD comes with an intuitive Web interface that lets users disambiguate short documents on the fly without writing a single line of code. Not only does this interface showcase the capabilities of our system, but we also hope it will attract the interest of new researchers to the field of lexical semantics.