Roberto Navigli - ACL Anthology

Roberto Navigli

2025

SemEval-2025 Task 2: Entity-Aware Machine Translation
Simone Conia | Min Li | Roberto Navigli | Saloni Potdar
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Translating text that contains complex or challenging named entities—e.g., cultural-specific book and movie titles, location names, proper nouns, food names, etc.—remains a difficult task for modern machine translation systems, including the latest large language models. To systematically study and advance progress in this area, we organized Entity-Aware Machine Translation, or EA-MT, a shared task that evaluates how well systems handle entity translation across 10 language pairs. With EA-MT, we introduce XC-Translate, a novel gold benchmark comprising over 50K manually-translated sentences with entity names that can deviate significantly from word-to-word translations in their target languages. This paper describes the creation process of XC-Translate, provides an overview of the approaches explored by our participants, presents the main evaluation findings, and points toward open research directions, such as contextual retrieval methods for low-resource entities and more robust evaluation metrics for entity correctness. We hope that our shared task will inspire further research in entity-aware machine translation and foster the development of more culturally-accurate translation systems.

LiteraryQA: Towards Effective Evaluation of Long-document Narrative QA
Tommaso Bonomo | Luca Gioffré | Roberto Navigli
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Question Answering (QA) on narrative text poses a unique challenge to current systems, requiring a deep understanding of long, complex documents. However, the reliability of NarrativeQA, the most widely used benchmark in this domain, is hindered by noisy documents and flawed QA pairs. In this work, we introduce LiteraryQA, a high-quality subset of NarrativeQA focused on literary works. Using a human- and LLM-validated pipeline, we identify and correct low-quality QA samples while removing extraneous text from source documents. We then carry out a meta-evaluation of automatic metrics to clarify how systems should be evaluated on LiteraryQA.This analysis reveals that all n-gram-based metrics have a low system-level correlation to human judgment, while LLM-as-a-Judge evaluations, even with small open-weight models, can strongly agree with the ranking identified by humans.Finally, we benchmark a set of long-context LLMs on LiteraryQA. We release our code and data at https://github.com/sapienzaNLP/LiteraryQA.

How Much Do Encoder Models Know About Word Senses?
Simone Teglia | Simone Tedeschi | Roberto Navigli
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Word Sense Disambiguation (WSD) is a key task in Natural Language Processing (NLP), involving selecting the correct meaning of a word based on its context. With Pretrained Language Models (PLMs) like BERT and DeBERTa now well established, significant progress has been made in understanding contextual semantics. Nevertheless, how well these models inherently disambiguate word senses remains uncertain. In this work, we evaluate several encoder-only PLMs across two popular inventories (i.e. WordNet and the Oxford Dictionary of English) by analyzing their ability to separate word senses without any task-specific fine-tuning. We compute centroids of word senses and measure similarity to assess performance across different layers. Our results show that DeBERTa-v3 delivers the best performance on the task, with the middle layers (specifically the 7th and 8th layers) achieving the highest accuracy, outperforming the output layer by approximately 15 percentage points. Our experiments also explore the inherent structure of WordNet and ODE sense inventories, highlighting their influence on the overall model behavior and performance. Finally, based on our findings, we develop a small, efficient model for the WSD task that attains robust performance while significantly reducing the carbon footprint.

Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation
Luca Moroni | Giovanni Puccetti | Pere-Lluís Huguet Cabot | Andrei Stefan Bejgu | Alessio Miaschi | Edoardo Barba | Felice Dell’Orletta | Andrea Esuli | Roberto Navigli
Findings of the Association for Computational Linguistics: NAACL 2025

The number of pretrained Large Language Models (LLMs) is increasing steadily, though the majority are designed predominantly for the English language. While state-of-the-art LLMs can handle other languages, due to language contamination or some degree of multilingual pretraining data, they are not optimized for non-English languages, leading to inefficient encoding (high token “fertility”) and slower inference speed.In this work, we thoroughly compare a variety of vocabulary adaptation techniques for optimizing English LLMs for the Italian language, and put forward Semantic Alignment Vocabulary Adaptation (SAVA), a novel method that leverages neural mapping for vocabulary substitution. SAVA achieves competitive performance across multiple downstream tasks, enhancing grounded alignment strategies. We adapt two LLMs: Mistral-7B-v0.1, reducing token fertility by 25%, and Llama-3.1-8B, optimizing the vocabulary and reducing the number of parameters by 1 billion. We show that, following the adaptation of the vocabulary, these models can recover their performance with a relatively limited stage of continual training on the target language. Finally, we test the capabilities of the adapted models on various multi-choice and generative tasks.

Estimating Machine Translation Difficulty
Lorenzo Proietti | Stefano Perrella | Vilém Zouhar | Roberto Navigli | Tom Kocmi
Findings of the Association for Computational Linguistics: EMNLP 2025

Machine translation quality has steadily improved over the years, achieving near-perfect translations in recent benchmarks.These high-quality outputs make it difficult to distinguish between state-of-the-art models and to identify areas for future improvement.In this context, automatically identifying texts where machine translation systems struggle holds promise for developing more discriminative evaluations and guiding future research.In this work, we address this gap by formalizing the task of translation difficulty estimation, defining a text’s difficulty based on the expected quality of its translations.We introduce a new metric to evaluate difficulty estimators and use it to assess both baselines and novel approaches.Finally, we demonstrate the practical utility of difficulty estimators by using them to construct more challenging benchmarks for machine translation. Our results show that dedicated models outperform both heuristic-based methods and LLM-as-a-judge approaches, with sentinel-src achieving the best performance.Thus, we release two improved models for difficulty estimation, sentinel-src-24 and sentinel-src-25, which can be used to scan large collections of texts and select those most likely to challenge contemporary machine translation systems.

RAED: Retrieval-Augmented Entity Description Generation for Emerging Entity Linking and Disambiguation
Karim Ghonim | Pere-Lluís Huguet Cabot | Riccardo Orlando | Roberto Navigli
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Entity Linking and Entity Disambiguation systems aim to link entity mentions to their corresponding entries, typically represented by descriptions within a predefined, static knowledge base. Current models assume that these knowledge bases are complete and up-to-date, rendering them incapable of handling entities not yet included therein. However, in an ever-evolving world, new entities emerge regularly, making these static resources insufficient for practical applications. To address this limitation, we introduce RAED, a model that retrieves external knowledge to improve factual grounding in entity descriptions. Using sources such as Wikipedia, RAED effectively disambiguates entities and bases their descriptions on factual information, reducing the dependence on parametric knowledge. Our experiments show that retrieval not only enhances overall description quality metrics, but also reduces hallucinations. Moreover, despite not relying on fixed entity inventories, RAED outperforms systems that require predefined candidate sets at inference time on Entity Disambiguation. Finally, we show that descriptions generated by RAED provide useful entity representations for downstream Entity Linking models, leading to improved performance in the extremely challenging Emerging Entity Linking task.

Concept-pedia: a Wide-coverage Semantically-annotated Multimodal Dataset
Karim Ghonim | Andrei Stefan Bejgu | Alberte Fernández-Castro | Roberto Navigli
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Vision-language Models (VLMs), such as CLIP and SigLIP, have become the de facto standard for multimodal tasks, serving as essential building blocks for recent Multimodal Large Language Models, including LLaVA and PaliGemma. However, current evaluations for VLMs remain heavily anchored to ImageNet. In this paper, we question whether ImageNet’s coverage is still sufficiently challenging for modern VLMs, and investigate the impact of adding novel and varied concept categories, i.e., semantically grouped fine-grained synsets. To this end, we introduce Concept-pedia, a novel, large-scale, semantically-annotated multimodal resource covering more than 165,000 concepts. Leveraging a language-agnostic, automatic annotation pipeline grounded in Wikipedia, Concept-pedia expands the range of visual concepts, including diverse abstract categories. Building on Concept-pedia, we also present a manually-curated Visual Concept Recognition evaluation benchmark, Concept-10k, that spans thousands of concepts across a wide range of categories. Our experiments show that current models, although excelling on ImageNet, struggle with Concept-10k. Not only do these findings highlight a persistent bias toward ImageNet-centric concepts, but they also underscore the urgent need for more representative benchmarks. By offering a broader and semantically richer testbed, Concept-10k aims to support the development of multimodal systems that better generalize to the complexities of real-world visual concepts.

Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code
Taishi Nakamura | Mayank Mishra | Simone Tedeschi | Yekun Chai | Jason T. Stillerman | Felix Friedrich | Prateek Yadav | Tanmay Laud | Vu Minh Chien | Terry Yue Zhuo | Diganta Misra | Ben Bogin | Xuan-Son Vu | Marzena Karpinska | Arnav Varma Dantuluri | Wojciech Kusa | Tommaso Furlanello | Rio Yokota | Niklas Muennighoff | Suhas Pai | Tosin Adewumi | Veronika Laippala | Xiaozhe Yao | Adalberto Barbosa Junior | Aleksandr Drozd | Jordan Clive | Kshitij Gupta | Liangyu Chen | Qi Sun | Ken Tsui | Nour Moustafa-Fahmy | Nicolo Monti | Tai Dang | Ziyang Luo | Tien-Tung Bui | Roberto Navigli | Virendra Mehta | Matthew Blumberg | Victor May | Hiep Nguyen | Sampo Pyysalo
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track

Pretrained language models are integral part of AI applications, but their high computational cost for training limits accessibility. Initiatives such as Bloom and StarCoder aim to democratize access to pretrained models for collaborative community development. Despite these efforts, such models encounter challenges such as limited multilingual capabilities, risks of catastrophic forgetting during continual pretraining, and the high costs of training models from scratch, alongside the need to align with AI safety standards and regulatory frameworks. This paper presents Aurora-M, a 15B parameter multilingual open-source model trained on English, Finnish, Hindi, Japanese, Vietnamese, and code. Continually pretrained from StarCoderPlus on 435B additional tokens, Aurora-M surpasses 2T tokens in total training token count. It is the first open-source multilingual model fine-tuned on human-reviewed safety instructions, thus aligning its development not only with conventional red-teaming considerations, but also with the specific concerns articulated in the Biden-Harris Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. We evaluate Aurora-M across a wide range of tasks and languages, showcasing its robustness against catastrophic forgetting and its superior performance in multilingual settings, particularly in safety evaluations. We open-source Aurora-M and its variants to encourage responsible open-source development of large language models at https://huggingface.co/aurora-m.

What We Learned from Continually Training Minerva: A Case Study on Italian
Luca Moroni | Tommaso Bonomo | Luca Gioffré | Lu Xu | Domenico Fedele | Leonardo Colosi | Andrei Stefan Bejgu | Alessandro Scirè | Roberto Navigli
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

Has Machine Translation Evaluation Achieved Human Parity? The Human Reference and the Limits of Progress
Lorenzo Proietti | Stefano Perrella | Roberto Navigli
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In Machine Translation (MT) evaluation, metric performance is assessed based on agreement with human judgments. In recent years, automatic metrics have demonstrated increasingly high levels of agreement with humans. To gain a clearer understanding of metric performance and establish an upper bound, we incorporate human baselines in the MT meta-evaluation, that is, the assessment of MT metrics’ capabilities. Our results show that human annotators are not consistently superior to automatic metrics, with state-of-the-art metrics often ranking on par with or higher than human baselines. Despite these findings suggesting human parity, we discuss several reasons for caution. Finally, we explore the broader implications of our results for the research field, asking: Can we still reliably measure improvements in MT evaluation? With this work, we aim to shed light on the limits of our ability to measure progress in the field, fostering discussion on an issue that we believe is crucial to the entire MT evaluation community.

Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering
Francesco Maria Molfese | Luca Moroni | Luca Gioffré | Alessandro Scirè | Simone Conia | Roberto Navigli
Findings of the Association for Computational Linguistics: ACL 2025

One of the most widely used tasks for evaluating Large Language Models (LLMs) is Multiple-Choice Question Answering (MCQA). While open-ended question answering tasks are more challenging to evaluate, MCQA tasks are, in principle, easier to assess, as the model’s answer is thought to be simple to extract and is compared directly to a set of predefined choices. However, recent studies have started to question the reliability of MCQA evaluation, showing that multiple factors can significantly impact the reported performance of LLMs, especially when the model generates free-form text before selecting one of the answer choices. In this work, we shed light on the inconsistencies of MCQA evaluation strategies, which can lead to inaccurate and misleading model comparisons. We systematically analyze whether existing answer extraction methods are aligned with human judgment, and how they are influenced by answer constraints in the prompt across different domains. Our experiments demonstrate that traditional evaluation strategies often underestimate LLM capabilities, while LLM-based answer extractors are prone to systematic errors. Moreover, we reveal a fundamental trade-off between including format constraints in the prompt to simplify answer extraction and allowing models to generate free-form text to improve reasoning. Our findings call for standardized evaluation methodologies and highlight the need for more reliable and consistent MCQA evaluation practices.

BOOKCOREF: Coreference Resolution at Book Scale
Giuliano Martinelli | Tommaso Bonomo | Pere-Lluís Huguet Cabot | Roberto Navigli
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Coreference Resolution systems are typically evaluated on benchmarks containing small- to medium-scale documents.When it comes to evaluating long texts, however, existing benchmarks, such as LitBank, remain limited in length and do not adequately assess system capabilities at the book scale, i.e., when co-referring mentions span hundreds of thousands of tokens.To fill this gap, we first put forward a novel automatic pipeline that produces high-quality Coreference Resolution annotations on full narrative texts. Then, we adopt this pipeline to create the first book-scale coreference benchmark, BOOKCOREF, with an average document length of more than 200,000 tokens.We carry out a series of experiments showing the robustness of our automatic procedure and demonstrating the value of our resource, which enables current long-document coreference systems to gain up to +20 CoNLL-F1 points when evaluated on full books.Moreover, we report on the new challenges introduced by this unprecedented book-scale setting, highlighting that current models fail to deliver the same performance they achieve on smaller documents.We release our data and code to encourage research and development of new book-scale Coreference Resolution systems at https://github.com/sapienzanlp/bookcoref.

Sustainable Italian LLM Evaluation: Community Perspectives and Methodological Guidelines
Luca Moroni | Gianmarco Pappacoda | Edoardo Barba | Simone Conia | Andrea Galassi | Bernardo Magnini | Roberto Navigli | Paolo Torroni | Roberto Zanoli
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

DiBiMT: A Gold Evaluation Benchmark for Studying Lexical Ambiguity in Machine Translation
Federico Martelli | Stefano Perrella | Niccolò Campolungo | Tina Munda | Svetla Koeva | Carole Tiberius | Roberto Navigli
Computational Linguistics, Volume 51, Issue 2 - June 2025

Despite the remarkable progress made in the field of Machine Translation (MT), current systems still struggle when translating ambiguous words, especially when these express infrequent meanings. In order to investigate and analyze the impact of lexical ambiguity on automatic translations, several tasks and evaluation benchmarks have been proposed over the course of the last few years. However, work in this research direction suffers from critical shortcomings. Indeed, existing evaluation datasets are not entirely manually curated, which significantly compromises their reliability. Furthermore, current literature fails to provide detailed insights into the nature of the errors produced by models translating ambiguous words, lacking a thorough manual analysis across languages. With a view to overcoming these limitations, we propose Disambiguation Biases in MT (DiBiMT), an entirely manually curated evaluation benchmark for investigating disambiguation biases in eight language combinations and assessing the ability of both commercial and non-commercial systems to handle ambiguous words. We also examine and detail the errors produced by models in this scenario by carrying out a manual error analysis in all language pairs. Additionally, we perform an extensive array of experiments aimed at studying the behavior of models when dealing with ambiguous words. Finally, we show the ineffectiveness of standard MT evaluation settings for assessing the disambiguation capabilities of systems and highlight the need for additional efforts in this research direction and ad-hoc testbeds such as DiBiMT. Our benchmark is available at: https://nlp.uniroma1.it/dibimt/.

Multi-LMentry: Can Multilingual LLMs Solve Elementary Tasks Across Languages?
Luca Moroni | Javier Aula-Blasco | Simone Conia | Irene Baucells | Naiara Perez | Silvia Paniagua Suárez | Anna Sallés | Malte Ostendorff | Júlia Falcão | Guijin Son | Aitor Gonzalez-Agirre | Roberto Navigli | Marta Villegas
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

As large language models (LLMs) continue to improve, their evaluation increasingly centers on complex, high-level tasks, often at the expense of systematically assessing fundamental capabilities. To address this gap, recent work proposed LMentry, a compact benchmark comprising tasks that are trivial for humans but remain surprisingly difficult for LLMs. However, LMentry is limited to English, leaving its insights linguistically narrow. In this paper, we present Multi-LMentry, a ground-up recreation of LMentry that enables systematic evaluation of LLMs on basic reasoning and understanding tasks across nine diverse languages. Multi-LMentry includes English and expands to Basque, Brazilian Portuguese, Catalan, Galician, German, Italian, Korean, and Spanish, emphasizing the importance of cross-lingual and low-resource settings. To validate that Multi-LMentry is still trivial for humans, we demonstrate that L2 speakers with only elementary proficiency achieve near-perfect scores in a low-resource language, namely, Basque. Through extensive experiments, we reveal that state-of-the-art open-weight multilingual LLMs still fall short of human performance on elementary tasks in many languages. Our results expose new failure modes that remain hidden in monolingual evaluation, underscoring the need for rigorous, language-diverse “unit tests” of core model abilities.

Do Large Language Models Understand Word Senses?
Domenico Meconi | Simone Stirpe | Federico Martelli | Leonardo Lavalle | Roberto Navigli
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Understanding the meaning of words in context is a fundamental capability for Large Language Models (LLMs). Despite extensive evaluation efforts, the extent to which LLMs show evidence that they truly grasp word senses remains underexplored. In this paper, we address this gap by evaluating both i) the Word Sense Disambiguation (WSD) capabilities of instruction-tuned LLMs, comparing their performance to state-of-the-art systems specifically designed for the task, and ii) the ability of two top-performing open- and closed-source LLMs to understand word senses in three generative settings: definition generation, free-form explanation, and example generation. Notably, we find that, in the WSD task, leading models such as GPT-4o and DeepSeek-V3 achieve performance on par with specialized WSD systems, while also demonstrating greater robustness across domains and levels of difficulty. In the generation tasks, results reveal that LLMs can explain the meaning of words in context up to 98% accuracy, with the highest performance observed in the free-form explanation task, which best aligns with their generative capabilities.We release our code and data at: https://github.com/Babelscape/LLM-WSD.

xCoRe: Cross-context Coreference Resolution
Giuliano Martinelli | Bruno Gatti | Roberto Navigli
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Current coreference resolution systems are typically tailored for short- or medium-sized texts and struggle to scale to very long documents due to architectural limitations and implied memory costs.However, a few available solutions can be applied by inputting documents split into smaller windows. This is inherently similar to what happens in the cross-document setting, in which systems infer coreference relations between mentions that are found in separate documents.In this paper, we unify these two challenging settings under the general framework of cross-context coreference, and introduce xCoRe, a new unified approach designed to efficiently handle short-, long-, and cross-document coreference resolution.xCoRe adopts a three-step pipeline that first identifies mentions, then creates clusters within individual contexts, and finally merges clusters across contexts.In our experiments, we show that our formulation enables joint training on shared long- and cross-document resources, increasing data availability and particularly benefiting the challenging cross-document task.Our model achieves new state-of-the-art results on cross-document benchmarks and strong performance on long-document data, while retaining top-tier results on traditional datasets, positioning it as a robust, versatile solution that can be applied across all end-to-end coreference settings.We release our models and code at http://github.com/sapienzanlp/xcore.

2024

Efficient AMR Parsing with CLAP: Compact Linearization with an Adaptable Parser
Abelardo Carlos Martinez Lorenzo | Roberto Navigli
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Sequence-to-sequence models have become the de facto standard for Abstract Meaning Representation (AMR) parsing due to their high-quality performance. However, these systems face efficiency challenges because of their large model size and computational time, which limit their accessibility within the research community. This paper aims to break down these barriers by introducing a novel linearization and system that significantly enhances the efficiency and accessibility of previous AMR parsers. First, we propose our novel Compact linearization that simplifies encoding, thereby reducing the number of tokens by between 40% and 50%. Second, we present CLAP, an innovative modular system that maintains the model’s high performance while achieving remarkable 80% reduction in training and inference times. Furthermore, CLAP is compatible with multiple autoregressive Language Models (LM) and tokenizers, such as BART, T5, and others. These advancements underscore the importance of optimizing sequence-to-sequence models in AMR parsing, thus democratizing access to high-quality semantic analysis. Our code is publicly available at https://github.com/SapienzaNLP/clap/.

Minerva LLMs: The First Family of Large Language Models Trained from Scratch on Italian Data
Riccardo Orlando | Luca Moroni | Pere-Lluís Huguet Cabot | Simone Conia | Edoardo Barba | Sergio Orlandini | Giuseppe Fiameni | Roberto Navigli
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

The increasing popularity of Large Language Models (LLMs) has led to a surge in research on adapting existing models to different languages. However, the pretraining of non-English LLMs is still an underexplored area and there is no open-source endeavor that explores what is achievable with open Italian data. To address this issue, we present Minerva, the first family of LLMs trained from scratch on Italian data. The creation of Minerva is an opportunity to explore and investigate the pretraining of LLMs for the Italian language, outlining the challenges that arise when training LLMs with native Italian texts. Minerva demonstrates that an LLM for a specific language brings a number of practical benefits compared to the adaptation of an existing one, including deep control over the composition of the vocabulary and the training data. With this paper, we aim to provide a comprehensive overview of the design choices, results, and evaluation of our Minerva models, showing promising results on Italian benchmarks and downstream tasks. Most importantly, we share what we learned and the findings obtained during the development of Minerva, as we believe that our experience will be valuable for the academic and industrial communities interested in training non-English LLMs from scratch.

FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction
Alessandro Scirè | Karim Ghonim | Roberto Navigli
Findings of the Association for Computational Linguistics: ACL 2024

Recent advancements in text summarization, particularly with the advent of Large Language Models (LLMs), have shown remarkable performance. However, a notable challenge persists as a substantial number of automatically-generated summaries exhibit factual inconsistencies, such as hallucinations. In response to this issue, various approaches for the evaluation of consistency for summarization have emerged. Yet, these newly-introduced metrics face several limitations, including lack of interpretability, focus on short document summaries (e.g., news articles), and computational impracticality, especially for LLM-based metrics. To address these shortcomings, we propose Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction (FENICE), a more interpretable and efficient factuality-oriented metric. FENICE leverages an NLI-based alignment between information in the source document and a set of atomic facts, referred to as claims, extracted from the summary. Our metric sets a new state of the art on AGGREFACT, the de-facto benchmark for factuality evaluation. Moreover, we extend our evaluation to a more challenging setting by conducting a human annotation process of long-form summarization. In the hope of fostering research in summarization factuality evaluation, we release the code of our metric and our factuality annotations of long-form summarization at https://github.com/Babelscape/FENICE.

ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering
Francesco Maria Molfese | Simone Conia | Riccardo Orlando | Roberto Navigli
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Current Large Language Models (LLMs) have shown strong reasoning capabilities in commonsense question answering benchmarks, but the process underlying their success remains largely opaque. As a consequence, recent approaches have equipped LLMs with mechanisms for knowledge retrieval, reasoning and introspection, not only to improve their capabilities but also to enhance the interpretability of their outputs. However, these methods require additional training, hand-crafted templates or human-written explanations. To address these issues, we introduce ZEBRA, a zero-shot question answering framework that combines retrieval, case-based reasoning and introspection and dispenses with the need for additional training of the LLM. Given an input question, ZEBRA retrieves relevant question-knowledge pairs from a knowledge base and generates new knowledge by reasoning over the relationships in these pairs. This generated knowledge is then used to answer the input question, improving the model’s performance and interpretability. We evaluate our approach across 8 well-established commonsense reasoning benchmarks, demonstrating that ZEBRA consistently outperforms strong LLMs and previous knowledge integration approaches, achieving an average accuracy improvement of up to 4.5 points.

ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget
Riccardo Orlando | Pere-Lluís Huguet Cabot | Edoardo Barba | Roberto Navigli
Findings of the Association for Computational Linguistics: ACL 2024

Entity Linking (EL) and Relation Extraction (RE) are fundamental tasks in Natural Language Processing, serving as critical components in a wide range of applications. In this paper, we propose ReLiK, a Retriever-Reader architecture for both EL and RE, where, given an input text, the Retriever module undertakes the identification of candidate entities or relations that could potentially appear within the text. Subsequently, the Reader module is tasked to discern the pertinent retrieved entities or relations and establish their alignment with the corresponding textual spans. Notably, we put forward an innovative input representation that incorporates the candidate entities or relations alongside the text, making it possible to link entities or extract relations in a single forward pass and to fully leverage pre-trained language models contextualization capabilities, in contrast with previous Retriever-Reader-based methods, which require a forward pass for each candidate. Our formulation of EL and RE achieves state-of-the-art performance in both in-domain and out-of-domain benchmarks while using academic budget training and with up to 40x inference speed compared to competitors. Finally, we show how our architecture can be used seamlessly for Information Extraction (cIE), i.e. EL + RE, and setting a new state of the art by employing a shared Reader that simultaneously extracts entities and relations.

Language Pivoting from Parallel Corpora for Word Sense Disambiguation of Historical Languages: A Case Study on Latin
Iacopo Ghinassi | Simone Tedeschi | Paola Marongiu | Roberto Navigli | Barbara McGillivray
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Word Sense Disambiguation (WSD) is an important task in NLP, which serves the purpose of automatically disambiguating a polysemous word with its most likely sense in context. Recent studies have advanced the state of the art in this task, but most of the work has been carried out on contemporary English or other modern languages, leaving challenges posed by low-resource languages and diachronic change open. Although the problem with low-resource languages has recently been mitigated by using existing multilingual resources to propagate otherwise expensive annotations from English to other languages, such techniques have hitherto not been applied to historical languages such as Latin. In this work, we make the following two major contributions. First, we test such a strategy on a historical language and propose a new approach in this framework which makes use of existing bilingual corpora instead of native English datasets. Second, we fine-tune a Latin WSD model on the data produced and achieve state-of-the-art results on a standard benchmark for the task. Finally, we release the dataset generated with our approach, which is the largest dataset for Latin WSD to date. This work opens the door to further research, as our approach can be used for different historical and, generally, under-resourced languages.

MOSAICo: a Multilingual Open-text Semantically Annotated Interlinked Corpus
Simone Conia | Edoardo Barba | Abelardo Carlos Martinez Lorenzo | Pere-Lluís Huguet Cabot | Riccardo Orlando | Luigi Procopio | Roberto Navigli
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Several Natural Language Understanding (NLU) tasks focus on linking text to explicit knowledge, including Word Sense Disambiguation, Semantic Role Labeling, Semantic Parsing, and Relation Extraction. In addition to the importance of connecting raw text with explicit knowledge bases, the integration of such carefully curated knowledge into deep learning models has been shown to be beneficial across a diverse range of applications, including Language Modeling and Machine Translation. Nevertheless, the scarcity of semantically-annotated corpora across various tasks and languages limits the potential advantages significantly. To address this issue, we put forward MOSAICo, the first endeavor aimed at equipping the research community with the key ingredients to model explicit semantic knowledge at a large scale, providing hundreds of millions of silver yet high-quality annotations for four NLU tasks across five languages. We describe the creation process of MOSAICo, demonstrate its quality and variety, and analyze the interplay between different types of semantic information. MOSAICo, available at https://github.com/SapienzaNLP/mosaico, aims to drop the requirement of closed, licensed datasets and represents a step towards a level playing field across languages and tasks in NLU.

NounAtlas: Filling the Gap in Nominal Semantic Role Labeling
Roberto Navigli | Marco Lo Pinto | Pasquale Silvestri | Dennis Rotondi | Simone Ciciliano | Alessandro Scirè
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite significant advances in Semantic Role Labeling (SRL), much work in this field has been carried out with a focus on verbal predicates, with the research on nominal SRL lagging behind. In many contexts, however, nominal predicates are often as informative as verbal ones, thus needing proper treatment. In this paper we aim to fill this gap and make nominal SRL a first-class citizen. We introduce a novel approach to create the first large-scale, high-quality inventory of nominal predicates and organize them into semantically-coherent frames. Although automatically created, NounAtlas – our frame inventory – is subsequently fully validated. We then put forward a technique to generate silver training data for nominal SRL and show that a state-of-the-art SRL model can achieve good performance. Interestingly, thanks to our design choices which enable seamless integration of our predicate inventory with its verbal counterpart, we can mix verbal and nominal data and perform robust SRL on both types of predicates.

Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!
Stefano Perrella | Lorenzo Proietti | Alessandro Scirè | Edoardo Barba | Roberto Navigli
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Annually, at the Conference of Machine Translation (WMT), the Metrics Shared Task organizers conduct the meta-evaluation of Machine Translation (MT) metrics, ranking them according to their correlation with human judgments. Their results guide researchers toward enhancing the next generation of metrics and MT systems. With the recent introduction of neural metrics, the field has witnessed notable advancements. Nevertheless, the inherent opacity of these metrics has posed substantial challenges to the meta-evaluation process. This work highlights two issues with the meta-evaluation framework currently employed in WMT, and assesses their impact on the metrics rankings. To do this, we introduce the concept of sentinel metrics, which are designed explicitly to scrutinize the meta-evaluation process’s accuracy, robustness, and fairness. By employing sentinel metrics, we aim to validate our findings, and shed light on and monitor the potential biases or inconsistencies in the rankings. We discover that the present meta-evaluation framework favors two categories of metrics: i) those explicitly trained to mimic human quality assessments, and ii) continuous metrics. Finally, we raise concerns regarding the evaluation capabilities of state-of-the-art metrics, emphasizing that they might be basing their assessments on spurious correlations found in their training data.

Mitigating Data Scarcity in Semantic Parsing across Languages with the Multilingual Semantic Layer and its Dataset
Abelardo Carlos Martinez Lorenzo | Pere-Lluís Huguet Cabot | Karim Ghonim | Lu Xu | Hee-Soo Choi | Alberte Fernández-Castro | Roberto Navigli
Findings of the Association for Computational Linguistics: ACL 2024

Data scarcity is a prevalent challenge in the era of Large Language Models (LLMs). The insatiable hunger of LLMs for large corpora becomes even more pronounced when dealing with non-English and low-resource languages. The issue is particularly exacerbated in Semantic Parsing (SP), i.e. the task of converting text into a formal representation. The complexity of semantic formalisms makes training human annotators and subsequent data annotation unfeasible on a large scale, especially across languages. To mitigate this, we first introduce the Multilingual Semantic Layer (MSL), a conceptual evolution of previous formalisms, which decouples from disambiguation and external inventories and simplifies the task. MSL provides the necessary tools to encode the meaning across languages, paving the way for developing a high-quality semantic parsing dataset across different languages in a semi-automatic strategy. Subsequently, we manually refine a portion of this dataset and fine-tune GPT-3.5 to propagate these refinements across the dataset. Then, we manually annotate 1,100 sentences in eleven languages, including low-resource ones. Finally, we assess our dataset’s quality, showcasing the performance gap reduction across languages in Semantic Parsing.

CNER: Concept and Named Entity Recognition
Giuliano Martinelli | Francesco Molfese | Simone Tedeschi | Alberte Fernández-Castro | Roberto Navigli
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Named entities – typically expressed via proper nouns – play a key role in Natural Language Processing, as their identification and comprehension are crucial in tasks such as Relation Extraction, Coreference Resolution and Question Answering, among others. Tasks like these also often entail dealing with concepts – typically represented by common nouns – which, however, have not received as much attention. Indeed, the potential of their identification and understanding remains underexplored, as does the benefit of a synergistic formulation with named entities. To fill this gap, we introduce Concept and Named Entity Recognition (CNER), a new unified task that handles concepts and entities mentioned in unstructured texts seamlessly. We put forward a comprehensive set of categories that can be used to model concepts and named entities jointly, and propose new approaches for the creation of CNER datasets. We evaluate the benefits of performing CNER as a unified task extensively, showing that a CNER model gains up to +5.4 and +8 macro F1 points when compared to specialized named entity and concept recognition systems, respectively. Finally, to encourage the development of CNER systems, we release our datasets and models at https://github.com/Babelscape/cner.

Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends
Giuliano Martinelli | Edoardo Barba | Roberto Navigli
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large autoregressive generative models have emerged as the cornerstone for achieving the highest performance across several Natural Language Processing tasks. However, the urge to attain superior results has, at times, led to the premature replacement of carefully designed task-specific approaches without exhaustive experimentation. The Coreference Resolution task is no exception; all recent state-of-the-art solutions adopt large generative autoregressive models that outperform encoder-based discriminative systems. In this work, we challenge this recent trend by introducing Maverick, a carefully designed – yet simple – pipeline, which enables running a state-of-the-art Coreference Resolution system within the constraints of an academic budget, outperforming models with up to 13 billion parameters with as few as 500 million parameters. Maverick achieves state-of-the-art performance on the CoNLL-2012 benchmark, training with up to 0.006x the memory resources and obtaining a 170x faster inference compared to previous state-of-the-art systems. We extensively validate the robustness of the Maverick framework with an array of diverse experiments, reporting improvements over prior systems in data-scarce, long-document, and out-of-domain settings. We release our code and models for research purposes at https://github.com/SapienzaNLP/maverick-coref.

Exploring the Dissociated Nucleus Phenomenon in Semantic Role Labeling
Tommaso Bonomo | Simone Conia | Roberto Navigli
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Dependency-based Semantic Role Labeling (SRL) is bound to dependency parsing, as the arguments of a predicate are identified through the token that heads the dependency relation subtree of the argument. However, most dependency-based SRL corpora are susceptible to the dissociated nucleus problem: when a subclause’s semantic and structural cores are two separate words, the dependency tree chooses the structural token as the head of the subtree, coercing the SRL annotation into making the same choice. This leads to undesirable consequences: when directly using the output of a dependency-based SRL method in downstream tasks it is useful to work with the token representing the semantic core of a subclause, not the structural core. In this paper, we carry out a linguistically-driven investigation on the dissociated nucleus problem in dependency-based SRL and propose a novel algorithm that aligns predicate-argument structures to the syntactic structures from Universal Dependencies to select the semantic core of an argument. Our analysis shows that dissociated nuclei appear more often than one could expect, and that our novel algorithm greatly increases the richness of the semantic information in dependency-based SRL. We release the software to reproduce our experiments at http://omitted.link.

Analyzing Homonymy Disambiguation Capabilities of Pretrained Language Models
Lorenzo Proietti | Stefano Perrella | Simone Tedeschi | Giulia Vulpis | Leonardo Lavalle | Andrea Sanchietti | Andrea Ferrari | Roberto Navigli
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Word Sense Disambiguation (WSD) is a key task in Natural Language Processing (NLP), aiming to assign the correct meaning (sense) to a word in context. However, traditional WSD systems rely on WordNet as the underlying sense inventory, often differentiating meticulously between subtle nuances of word meanings, which may lead to excessive complexity and reduced practicality of WSD systems in today’s NLP. Indeed, current Pretrained Language Models (PLMs) do seem to be able to perform disambiguation, but it is not clear to what extent, or to what level of granularity, they actually operate. In this paper, we address these points and, firstly, introduce a new large-scale resource that leverages homonymy relations to systematically cluster WordNet senses, effectively reducing the granularity of word senses to a very coarse-grained level; secondly, we use this resource to train Homonymy Disambiguation systems and investigate whether PLMs are inherently able to differentiate coarse-grained word senses. Our findings demonstrate that, while state-of-the-art models still struggle to choose the correct fine-grained meaning of a word in context, Homonymy Disambiguation systems are able to differentiate homonyms with up to 95% accuracy scores even without fine-tuning the underlying PLM. We release our data and code at https://github.com/SapienzaNLP/homonymy-wsd.

Dissecting Biases in Relation Extraction: A Cross-Dataset Analysis on People’s Gender and Origin
Marco Stranisci | Pere-Lluís Huguet Cabot | Elisa Bassignana | Roberto Navigli
Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Relation Extraction (RE) is at the core of many Natural Language Understanding tasks, including knowledge-base population and Question Answering. However, any Natural Language Processing system is exposed to biases, and the analysis of these has not received much attention in RE. We propose a new method for inspecting bias in the RE pipeline, which is completely transparent in terms of interpretability. Specifically, in this work we analyze biases related to gender and place of birth. Our methodology includes (i) obtaining semantic triplets (subject, object, semantic relation) involving ‘person’ entities from RE resources, (ii) collecting meta-information (‘gender’ and ‘place of birth’) using Entity Linking technologies, and then (iii) analyze the distribution of triplets across different groups (e.g., men versus women). We investigate bias at two levels: In the training data of three commonly used RE datasets (SREDFM, CrossRE, NYT), and in the predictions of a state-of-the-art RE approach (ReLiK). To enable cross-dataset analysis, we introduce a taxonomy of relation types mapping the label sets of different RE datasets to a unified label space. Our findings reveal that bias is a compounded issue affecting underrepresented groups within data and predictions for RE.

Towards a More Comprehensive Evaluation for Italian LLMs
Luca Moroni | Simone Conia | Federico Martelli | Roberto Navigli
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Recent Large Language Models (LLMs) have shown impressive performance in addressing complex aspects of human language. These models have also demonstrated significant capabilities in processing and generating Italian text, achieving state-of-the-art results on current benchmarks for the Italian language. However, the number of such benchmarks is still insufficient. A case in point is the “Open Ita LLM Leaderboard” which only supports three benchmarks, despite being one of the most popular evaluation suite for the evaluation of Italian-speaking LLMs. In this paper, we analyze the current pitfalls of existing evaluation suites and propose two ways to this gap: i) a new suite of automatically-translated benchmarks, drawn from the most popular English benchmarks; and ii) the adaptation of existing manual dataset so that they can be used to complement the evaluation of Italian LLMs. We discuss the pros and cons of both approaches and release all our data to foster further research on the evaluation of Italian-speaking LLMs.

Beyond Correlation: Interpretable Evaluation of Machine Translation Metrics
Stefano Perrella | Lorenzo Proietti | Pere-Lluís Huguet Cabot | Edoardo Barba | Roberto Navigli
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Machine Translation (MT) evaluation metrics assess translation quality automatically. Recently, researchers have employed MT metrics for various new use cases, such as data filtering and translation re-ranking. However, most MT metrics return assessments as scalar scores that are difficult to interpret, posing a challenge to making informed design choices. Moreover, MT metrics’ capabilities have historically been evaluated using correlation with human judgment, which, despite its efficacy, falls short of providing intuitive insights into metric performance, especially in terms of new metric use cases. To address these issues, we introduce an interpretable evaluation framework for MT metrics. Within this framework, we evaluate metrics in two scenarios that serve as proxies for the data filtering and translation re-ranking use cases. Furthermore, by measuring the performance of MT metrics using Precision, Recall, and F-score, we offer clearer insights into their capabilities than correlation with human judgments. Finally, we raise concerns regarding the reliability of manually curated data following the Direct Assessments+Scalar Quality Metrics (DA+SQM) guidelines, reporting a notably low agreement with Multidimensional Quality Metrics (MQM) annotations.

CroCoAlign: A Cross-Lingual, Context-Aware and Fully-Neural Sentence Alignment System for Long Texts
Francesco Maria Molfese | Andrei Stefan Bejgu | Simone Tedeschi | Simone Conia | Roberto Navigli
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Sentence alignment – establishing links between corresponding sentences in two related documents – is an important NLP task with several downstream applications, such as machine translation (MT). Despite the fact that existing sentence alignment systems have achieved promising results, their effectiveness is based on auxiliary information such as document metadata or machine-generated translations, as well as hyperparameter-sensitive techniques. Moreover, these systems often overlook the crucial role that context plays in the alignment process. In this paper, we address the aforementioned issues and propose CroCoAlign: the first context-aware, end-to-end and fully neural architecture for sentence alignment. Our system maps source and target sentences in long documents by contextualizing their sentence embeddings with respect to the other sentences in the document. We extensively evaluate CroCoAlign on a multilingual dataset consisting of 20 language pairs derived from the Opus project, and demonstrate that our model achieves state-of-the-art performance. To ensure reproducibility, we release our code and model checkpoints at https://github.com/Babelscape/CroCoAlign.

Word Sense Linking: Disambiguating Outside the Sandbox
Andrei Stefan Bejgu | Edoardo Barba | Luigi Procopio | Alberte Fernández-Castro | Roberto Navigli
Findings of the Association for Computational Linguistics: ACL 2024

Word Sense Disambiguation (WSD) is the task of associating a word in a given context with its most suitable meaning among a set of possible candidates. While the task has recently witnessed renewed interest, with systems achieving performances above the estimated inter-annotator agreement, at the time of writing it still struggles to find downstream applications. We argue that one of the reasons behind this is the difficulty of applying WSD to plain text. Indeed, in the standard formulation, models work under the assumptions that a) all the spans to disambiguate have already been identified, and b) all the possible candidate senses of each span are provided, both of which are requirements that are far from trivial. In this work, we present a new task called Word Sense Linking (WSL) where, given an input text and a reference sense inventory, systems have to both identify which spans to disambiguate and then link them to their most suitable meaning.We put forward a transformer-based architecture for the task and thoroughly evaluate both its performance and those of state-of-the-art WSD systems scaled to WSL, iteratively relaxing the assumptions of WSD. We hope that our work will foster easier integration of lexical semantics into downstream applications.

2023

DMLM: Descriptive Masked Language Modeling
Edoardo Barba | Niccolò Campolungo | Roberto Navigli
Findings of the Association for Computational Linguistics: ACL 2023

Over the last few years, Masked Language Modeling (MLM) pre-training has resulted in remarkable advancements in many Natural Language Understanding (NLU) tasks, which sparked an interest in researching alternatives and extensions to the MLM objective. In this paper, we tackle the absence of explicit semantic grounding in MLM and propose Descriptive Masked Language Modeling (DMLM), a knowledge-enhanced reading comprehension objective, where the model is required to predict the most likely word in a context, being provided with the word’s definition. For instance, given the sentence “I was going to the _”, if we provided as definition “financial institution”, the model would have to predict the word “bank”; if, instead, we provided “sandy seashore”, the model should predict “beach”. Our evaluation highlights the effectiveness of DMLM in comparison with standard MLM, showing improvements on a number of well-established NLU benchmarks, as well as other semantics-focused tasks, e.g., Semantic Role Labeling. Furthermore, we demonstrate how it is possible to take full advantage of DMLM to embed explicit semantics in downstream tasks, explore several properties of DMLM-based contextual representations and suggest a number of future directions to investigate.

Cross-lingual AMR Aligner: Paying Attention to Cross-Attention
Abelardo Carlos Martínez Lorenzo | Pere Lluís Huguet Cabot | Roberto Navigli
Findings of the Association for Computational Linguistics: ACL 2023

This paper introduces a novel aligner for Abstract Meaning Representation (AMR) graphs that can scale cross-lingually, and is thus capable of aligning units and spans in sentences of different languages. Our approach leverages modern Transformer-based parsers, which inherently encode alignment information in their cross-attention weights, allowing us to extract this information during parsing. This eliminates the need for English-specific rules or the Expectation Maximization (EM) algorithm that have been used in previous approaches. In addition, we propose a guided supervised method using alignment to further enhance the performance of our aligner. We achieve state-of-the-art results in the benchmarks for AMR alignment and demonstrate our aligner’s ability to obtain them across multiple languages. Our code will be available at [https://www.github.com/babelscape/AMR-alignment](https://www.github.com/babelscape/AMR-alignment).

AMRs Assemble! Learning to Ensemble with Autoregressive Models for AMR Parsing
Abelardo Carlos Martínez Lorenzo | Pere Lluís Huguet Cabot | Roberto Navigli
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In this paper, we examine the current state-of-the-art in AMR parsing, which relies on ensemble strategies by merging multiple graph predictions. Our analysis reveals that the present models often violate AMR structural constraints. To address this issue, we develop a validation method, and show how ensemble models can exploit SMATCH metric weaknesses to obtain higher scores, but sometimes result in corrupted graphs. Additionally, we highlight the demanding need to compute the SMATCH score among all possible predictions. To overcome these challenges, we propose two novel ensemble strategies based on Transformer models, improving robustness to structural constraints, while also reducing the computational time. Our methods provide new insights for enhancing AMR parsers and metrics. Our code is available at [https://www.github.com/babelscape/AMRs-Assemble](https://www.github.com/babelscape/AMRs-Assemble).

Exploring Non-Verbal Predicates in Semantic Role Labeling: Challenges and Opportunities
Riccardo Orlando | Simone Conia | Roberto Navigli
Findings of the Association for Computational Linguistics: ACL 2023

Although we have witnessed impressive progress in Semantic Role Labeling (SRL), most of the research in the area is carried out assuming that the majority of predicates are verbs. Conversely, predicates can also be expressed using other parts of speech, e.g., nouns and adjectives. However, non-verbal predicates appear in the benchmarks we commonly use to measure progress in SRL less frequently than in some real-world settings – newspaper headlines, dialogues, and tweets, among others. In this paper, we put forward a new PropBank dataset which boasts wide coverage of multiple predicate types. Thanks to it, we demonstrate empirically that standard benchmarks do not provide an accurate picture of the current situation in SRL and that state-of-the-art systems are still incapable of transferring knowledge across different predicate types. Having observed these issues, we also present a novel, manually-annotated challenge set designed to give equal importance to verbal, nominal, and adjectival predicate-argument structures. We use such dataset to investigate whether we can leverage different linguistic resources to promote knowledge transfer. In conclusion, we claim that SRL is far from “solved”, and its integration with other semantic tasks might enable significant improvements in the future, especially for the long tail of non-verbal predicates, thereby facilitating further research on SRL for non-verbal predicates. We release our software and datasets at https://github.com/sapienzanlp/exploring-srl.

Echoes from Alexandria: A Large Resource for Multilingual Book Summarization
Alessandro Scirè | Simone Conia | Simone Ciciliano | Roberto Navigli
Findings of the Association for Computational Linguistics: ACL 2023

In recent years, research in text summarization has mainly focused on the news domain, where texts are typically short and have strong layout features. The task of full-book summarization presents additional challenges which are hard to tackle with current resources, due to their limited size and availability in English only. To overcome these limitations, we present “Echoes from Alexandria”, or in shortened form, “Echoes”, a large resource for multilingual book summarization. Echoes featuresthree novel datasets: i) Echo-Wiki, for multilingual book summarization, ii) Echo-XSum, for extremely-compressive multilingual book summarization, and iii) Echo-FairySum, for extractive book summarization. To the best of our knowledge, Echoes – with its thousands of books and summaries – is the largest resource, and the first to be multilingual, featuring 5 languages and 25 language pairs. In addition to Echoes, we also introduce a new extractive-then-abstractive baseline, and, supported by our experimental results and manual analysis of the summaries generated, we argue that this baseline is more suitable for book summarization than purely-abstractive approaches. We release our resource and software at https://github.com/Babelscape/echoes-from-alexandria in the hope of fostering innovative research in multilingual booksummarization.

RED^FM: a Filtered and Multilingual Relation Extraction Dataset
‪Pere-Lluís Huguet Cabot | Simone Tedeschi | Axel-Cyrille Ngonga Ngomo | Roberto Navigli
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Relation Extraction (RE) is a task that identifies relationships between entities in a text, enabling the acquisition of relational facts and bridging the gap between natural language and structured knowledge. However, current RE models often rely on small datasets with low coverage of relation types, particularly when working with languages other than English.In this paper, we address the above issue and provide two new resources that enable the training and evaluation of multilingual RE systems. First, we present SRED^FM, an automatically annotated dataset covering 18 languages, 400 relation types, 13 entity types, totaling more than 40 million triplet instances. Second, we propose RED^FM, a smaller, human-revised dataset for seven languages that allows for the evaluation of multilingual RE systems. To demonstrate the utility of these novel datasets, we experiment with the first end-to-end multilingual RE model, mREBEL, that extracts triplets, including entity types, in multiple languages. We release our resources and model checkpoints at [https://www.github.com/babelscape/rebel](https://www.github.com/babelscape/rebel).

Incorporating Graph Information in Transformer-based AMR Parsing
Pavlo Vasylenko | Pere Lluís Huguet Cabot | Abelardo Carlos Martínez Lorenzo | Roberto Navigli
Findings of the Association for Computational Linguistics: ACL 2023

Abstract Meaning Representation (AMR) is a Semantic Parsing formalism that aims at providing a semantic graph abstraction representing a given text. Current approaches are based on autoregressive language models such as BART or T5, fine-tuned through Teacher Forcing to obtain a linearized version of the AMR graph from a sentence. In this paper, we present LeakDistill, a model and method that explores a modification to the Transformer architecture, using structural adapters to explicitly incorporate graph information into the learned representations and improve AMR parsing performance. Our experiments show how, by employing word-to-node alignment to embed graph structural information into the encoder at training time, we can obtain state-of-the-art AMR parsing through self-knowledge distillation, even without the use of additional data. We release the code at [http://www.github.com/sapienzanlp/LeakDistill](http://www.github.com/sapienzanlp/LeakDistill).

Entity Disambiguation with Entity Definitions
Luigi Procopio | Simone Conia | Edoardo Barba | Roberto Navigli
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Local models have recently attained astounding performances in Entity Disambiguation (ED), with generative and extractive formulations being the most promising research directions. However, previous works have so far limited their studies to using, as the textual representation of each candidate, only its Wikipedia title. Although certainly effective, this strategy presents a few critical issues, especially when titles are not sufficiently informative or distinguishable from one another. In this paper, we address this limitation and investigate the extent to which more expressive textual representations can mitigate it. We evaluate our approach thoroughly against standard benchmarks in ED and find extractive formulations to be particularly well-suited to such representations. We report a new state of the art on 2 out of the 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns. We release our code, data and model checkpoints at https://github.com/SapienzaNLP/extend.

What’s the Meaning of Superhuman Performance in Today’s NLU?
Simone Tedeschi | Johan Bos | Thierry Declerck | Jan Hajič | Daniel Hershcovich | Eduard Hovy | Alexander Koller | Simon Krek | Steven Schockaert | Rico Sennrich | Ekaterina Shutova | Roberto Navigli
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In the last five years, there has been a significant focus in Natural Language Processing (NLP) on developing larger Pretrained Language Models (PLMs) and introducing benchmarks such as SuperGLUE and SQuAD to measure their abilities in language understanding, reasoning, and reading comprehension. These PLMs have achieved impressive results on these benchmarks, even surpassing human performance in some cases. This has led to claims of superhuman capabilities and the provocative idea that certain tasks have been solved. In this position paper, we take a critical look at these claims and ask whether PLMs truly have superhuman abilities and what the current benchmarks are really evaluating. We show that these benchmarks have serious limitations affecting the comparison between humans and PLMs and provide recommendations for fairer and more transparent benchmarks.

APatt at SemEval-2023 Task 3: The Sapienza NLP System for Ensemble-based Multilingual Propaganda Detection
Antonio Purificato | Roberto Navigli
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

In this paper, we present our approach to the task of identification of persuasion techniques in text, which is a subtask of the SemEval-2023 Task 3 on the multilingual detection of genre, framing, and persuasion techniques in online news. The subtask is multi-label at the paragraph level and the inventory considered by the organizers covers 23 persuasion techniques. Our solution is based on an ensemble of a variety of pre-trained language models (PLMs) fine-tuned on the propaganda dataset. We first describe our system, the different experimental setups we considered, and then provide the results on the dev and test sets released by the organizers. The official evaluation shows our solution ranks 1st in English and attains high scores in all the other languages, i.e. French, German, Italian, Polish, and Russian. We also perform an extensive analysis of the data and the annotations to investigate how they can influence the quality of our systems.

LexicoMatic: Automatic Creation of Multilingual Lexical-Semantic Dictionaries
Federico Martelli | Luigi Procopio | Edoardo Barba | Roberto Navigli
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Code-Switching with Word Senses for Pretraining in Neural Machine Translation
Vivek Iyer | Edoardo Barba | Alexandra Birch | Jeff Pan | Roberto Navigli
Findings of the Association for Computational Linguistics: EMNLP 2023

Lexical ambiguity is a significant and pervasive challenge in Neural Machine Translation (NMT), with many state-of-the-art (SOTA) NMT systems struggling to handle polysemous words (Campolungo et al., 2022). The same holds for the NMT pretraining paradigm of denoising synthetic “code-switched” text (Pan et al., 2021; Iyer et al., 2023), where word senses are ignored in the noising stage – leading to harmful sense biases in the pretraining data that are subsequently inherited by the resulting models. In this work, we introduce Word Sense Pretraining for Neural Machine Translation (WSP-NMT) - an end-to-end approach for pretraining multilingual NMT models leveraging word sense-specific information from Knowledge Bases. Our experiments show significant improvements in overall translation quality. Then, we show the robustness of our approach to scale to various challenging data and resource-scarce scenarios and, finally, report fine-grained accuracy improvements on the DiBiMT disambiguation benchmark. Our studies yield interesting and novel insights into the merits and challenges of integrating word sense information and structured knowledge in multilingual pretraining for NMT.

2022

EUREKA: EUphemism Recognition Enhanced through Knn-based methods and Augmentation
Sedrick Scott Keh | Rohit Bharadwaj | Emmy Liu | Simone Tedeschi | Varun Gangal | Roberto Navigli
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)

We introduce EUREKA, an ensemble-based approach for performing automatic euphemism detection. We (1) identify and correct potentially mislabelled rows in the dataset, (2) curate an expanded corpus called EuphAug, (3) leverage model representations of Potentially Euphemistic Terms (PETs), and (4) explore using representations of semantically close sentences to aid in classification. Using our augmented dataset and kNN-based methods, EUREKA was able to achieve state-of-the-art results on the public leaderboard of the Euphemism Detection Shared Task, ranking first with a macro F1 score of 0.881.

NER4ID at SemEval-2022 Task 2: Named Entity Recognition for Idiomaticity Detection
Simone Tedeschi | Roberto Navigli
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Idioms are lexically-complex phrases whose meaning cannot be derived by compositionally interpreting their components. Although the automatic identification and understanding of idioms is essential for a wide range of Natural Language Understanding tasks, they are still largely under-investigated. This motivated the organization of the SemEval-2022 Task 2, which is divided into two multilingual subtasks: one about idiomaticity detection, and the other about sentence embeddings. In this work, we focus on the first subtask and propose a Transformer-based dual-encoder architecture to compute the semantic similarity between a potentially-idiomatic expression and its context and, based on this, predict idiomaticity. Then, we show how and to what extent Named Entity Recognition can be exploited to reduce the degree of confusion of idiom identification systems and, therefore, improve performance. Our model achieves 92.1 F1 in the one-shot setting and shows strong robustness towards unseen idioms achieving 77.4 F1 in the zero-shot setting. We release our code at https://github.com/Babelscape/ner4id.

DiBiMT: A Novel Benchmark for Measuring Word Sense Disambiguation Biases in Machine Translation
Niccolò Campolungo | Federico Martelli | Francesco Saina | Roberto Navigli
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Lexical ambiguity poses one of the greatest challenges in the field of Machine Translation. Over the last few decades, multiple efforts have been undertaken to investigate incorrect translations caused by the polysemous nature of words. Within this body of research, some studies have posited that models pick up semantic biases existing in the training data, thus producing translation errors. In this paper, we present DiBiMT, the first entirely manually-curated evaluation benchmark which enables an extensive study of semantic biases in Machine Translation of nominal and verbal words in five different language combinations, namely, English and one or other of the following languages: Chinese, German, Italian, Russian and Spanish. Furthermore, we test state-of-the-art Machine Translation systems, both commercial and non-commercial ones, against our new test bed and provide a thorough statistical and linguistic analysis of the results. We release DiBiMT at https://nlp.uniroma1.it/dibimt as a closed benchmark with a public leaderboard.

Nibbling at the Hard Core of Word Sense Disambiguation
Marco Maru | Simone Conia | Michele Bevilacqua | Roberto Navigli
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

With state-of-the-art systems having finally attained estimated human performance, Word Sense Disambiguation (WSD) has now joined the array of Natural Language Processing tasks that have seemingly been solved, thanks to the vast amounts of knowledge encoded into Transformer-based pre-trained language models. And yet, if we look below the surface of raw figures, it is easy to realize that current approaches still make trivial mistakes that a human would never make. In this work, we provide evidence showing why the F1 score metric should not simply be taken at face value and present an exhaustive analysis of the errors that seven of the most representative state-of-the-art systems for English all-words WSD make on traditional evaluation benchmarks. In addition, we produce and release a collection of test sets featuring (a) an amended version of the standard evaluation benchmark that fixes its lexical and semantic inaccuracies, (b) 42D, a challenge set devised to assess the resilience of systems with respect to least frequent word senses and senses not seen at training time, and (c) hardEN, a challenge set made up solely of instances which none of the investigated state-of-the-art systems can solve. We make all of the test sets and model predictions available to the research community at https://github.com/SapienzaNLP/wsd-hard-benchmark.

MaTESe: Machine Translation Evaluation as a Sequence Tagging Problem
Stefano Perrella | Lorenzo Proietti | Alessandro Scirè | Niccolò Campolungo | Roberto Navigli
Proceedings of the Seventh Conference on Machine Translation (WMT)

Starting from last year, WMT human evaluation has been performed within the Multidimensional Quality Metrics (MQM) framework, where human annotators are asked to identify error spans in translations, alongside an error category and a severity. In this paper, we describe our submission to the WMT 2022 Metrics Shared Task, where we propose using the same paradigm for automatic evaluation: we present the MaTESe metrics, which reframe machine translation evaluation as a sequence tagging problem. Our submission also includes a reference-free metric, denominated MaTESe-QE. Despite the paucity of the openly available MQM data, our metrics obtain promising results, showing high levels of correlation with human judgements, while also enabling an evaluation that is interpretable. Moreover, MaTESe-QE can also be employed in settings where it is infeasible to curate reference translations manually.

SemEval-2022 Task 9: R2VQ – Competence-based Multimodal Question Answering
Jingxuan Tu | Eben Holderness | Marco Maru | Simone Conia | Kyeongmin Rim | Kelley Lynch | Richard Brutti | Roberto Navigli | James Pustejovsky
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

In this task, we identify a challenge that is reflective of linguistic and cognitive competencies that humans have when speaking and reasoning. Particularly, given the intuition that textual and visual information mutually inform each other for semantic reasoning, we formulate a Competence-based Question Answering challenge, designed to involve rich semantic annotation and aligned text-video objects. The task is to answer questions from a collection of cooking recipes and videos, where each question belongs to a “question family” reflecting a specific reasoning competence. The data and task result is publicly available.

Probing for Predicate Argument Structures in Pretrained Language Models
Simone Conia | Roberto Navigli
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Thanks to the effectiveness and wide availability of modern pretrained language models (PLMs), recently proposed approaches have achieved remarkable results in dependency- and span-based, multilingual and cross-lingual Semantic Role Labeling (SRL). These results have prompted researchers to investigate the inner workings of modern PLMs with the aim of understanding how, where, and to what extent they encode information about SRL. In this paper, we follow this line of research and probe for predicate argument structures in PLMs. Our study shows that PLMs do encode semantic structures directly into the contextualized representation of a predicate, and also provides insights into the correlation between predicate senses and their structures, the degree of transferability between nominal and verbal structures, and how such structures are encoded across languages. Finally, we look at the practical implications of such insights and demonstrate the benefits of embedding predicate argument structure information into an SRL model.

Fully-Semantic Parsing and Generation: the BabelNet Meaning Representation
Abelardo Carlos Martínez Lorenzo | Marco Maru | Roberto Navigli
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A language-independent representation of meaning is one of the most coveted dreams in Natural Language Understanding. With this goal in mind, several formalisms have been proposed as frameworks for meaning representation in Semantic Parsing. And yet, the dependencies these formalisms share with respect to language-specific repositories of knowledge make the objective of closing the gap between high- and low-resourced languages hard to accomplish. In this paper, we present the BabelNet Meaning Representation (BMR), an interlingual formalism that abstracts away from language-specific constraints by taking advantage of the multilingual semantic resources of BabelNet and VerbAtlas. We describe the rationale behind the creation of BMR and put forward BMR 1.0, a dataset labeled entirely according to the new formalism. Moreover, we show how BMR is able to outperform previous formalisms thanks to its fully-semantic framing, which enables top-notch multilingual parsing and generation. We release the code at https://github.com/SapienzaNLP/bmr.

A Tour of Explicit Multilingual Semantics: Word Sense Disambiguation, Semantic Role Labeling and Semantic Parsing
Roberto Navigli | Edoardo Barba | Simone Conia | Rexhina Blloshmi
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Tutorial Abstracts

The recent advent of modern pretrained language models has sparked a revolution in Natural Language Processing (NLP), especially in multilingual and cross-lingual applications. Today, such language models have become the de facto standard for providing rich input representations to neural systems, achieving unprecedented results in an increasing range of benchmarks. However, questions that often arise are: firstly, whether current language models are, indeed, able to capture explicit, symbolic meaning; secondly, if they are, to what extent; thirdly, and perhaps more importantly, whether current approaches are capable of scaling across languages. In this cutting-edge tutorial, we will review recent efforts that have aimed at shedding light on meaning in NLP, with a focus on three key open problems in lexical and sentence-level semantics: Word Sense Disambiguation, Semantic Role Labeling, and Semantic Parsing. After a brief introduction, we will spotlight how state-of-the-art models tackle these tasks in multiple languages, showing where they excel and where they fail. We hope that this tutorial will broaden the audience interested in multilingual semantics and inspire researchers to further advance the field.

SRL4E – Semantic Role Labeling for Emotions: A Unified Evaluation Framework
Cesare Campagnano | Simone Conia | Roberto Navigli
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In the field of sentiment analysis, several studies have highlighted that a single sentence may express multiple, sometimes contrasting, sentiments and emotions, each with its own experiencer, target and/or cause. To this end, over the past few years researchers have started to collect and annotate data manually, in order to investigate the capabilities of automatic systems not only to distinguish between emotions, but also to capture their semantic constituents. However, currently available gold datasets are heterogeneous in size, domain, format, splits, emotion categories and role labels, making comparisons across different works difficult and hampering progress in the area. In this paper, we tackle this issue and present a unified evaluation framework focused on Semantic Role Labeling for Emotions (SRL4E), in which we unify several datasets tagged with emotions and semantic roles by using a common labeling scheme. We use SRL4E as a benchmark to evaluate how modern pretrained language models perform and analyze where we currently stand in this task, hoping to provide the tools to facilitate studies in this complex area.

Semantic Role Labeling Meets Definition Modeling: Using Natural Language to Describe Predicate-Argument Structures
Simone Conia | Edoardo Barba | Alessandro Scirè | Roberto Navigli
Findings of the Association for Computational Linguistics: EMNLP 2022

One of the common traits of past and present approaches for Semantic Role Labeling (SRL) is that they rely upon discrete labels drawn from a predefined linguistic inventory to classify predicate senses and their arguments.However, we argue this need not be the case. In this paper, we present an approach that leverages Definition Modeling to introduce a generalized formulation of SRL as the task of describing predicate-argument structures using natural language definitions instead of discrete labels. Our novel formulation takes a first step towards placing interpretability and flexibility foremost, and yet our experiments and analyses on PropBank-style and FrameNet-style, dependency-based and span-based SRL also demonstrate that a flexible model with an interpretable output does not necessarily come at the expense of performance. We release our software for research purposes at https://github.com/SapienzaNLP/dsrl.

ID10M: Idiom Identification in 10 Languages
Simone Tedeschi | Federico Martelli | Roberto Navigli
Findings of the Association for Computational Linguistics: NAACL 2022

Idioms are phrases which present a figurative meaning that cannot be (completely) derived by looking at the meaning of their individual components. Identifying and understanding idioms in context is a crucial goal and a key challenge in a wide range of Natural Language Understanding tasks. Although efforts have been undertaken in this direction, the automatic identification and understanding of idioms is still a largely under-investigated area, especially when operating in a multilingual scenario. In this paper, we address such limitations and put forward several new contributions: we propose a novel multilingual Transformer-based system for the identification of idioms; we produce a high-quality automatically-created training dataset in 10 languages, along with a novel manually-curated evaluation benchmark; finally, we carry out a thorough performance analysis and release our evaluation suite at https://github.com/Babelscape/ID10M.

Universal Semantic Annotator: the First Unified API for WSD, SRL and Semantic Parsing
Riccardo Orlando | Simone Conia | Stefano Faralli | Roberto Navigli
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we present the Universal Semantic Annotator (USeA), which offers the first unified API for high-quality automatic annotations of texts in 100 languages through state-of-the-art systems for Word Sense Disambiguation, Semantic Role Labeling and Semantic Parsing. Together, such annotations can be used to provide users with rich and diverse semantic information, help second-language learners, and allow researchers to integrate explicit semantic knowledge into downstream tasks and real-world applications.

Reducing Disambiguation Biases in NMT by Leveraging Explicit Word Sense Information
Niccolò Campolungo | Tommaso Pasini | Denis Emelin | Roberto Navigli
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recent studies have shed some light on a common pitfall of Neural Machine Translation (NMT) models, stemming from their struggle to disambiguate polysemous words without lapsing into their most frequently occurring senses in the training corpus. In this paper, we first provide a novel approach for automatically creating high-precision sense-annotated parallel corpora, and then put forward a specifically tailored fine-tuning strategy for exploiting these sense annotations during training without introducing any additional requirement at inference time. The use of explicit senses proved to be beneficial to reduce the disambiguation bias of a baseline NMT model, while, at the same time, leading our system to attain higher BLEU scores than its vanilla counterpart in 3 language pairs.

MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguation)
Simone Tedeschi | Roberto Navigli
Findings of the Association for Computational Linguistics: NAACL 2022

Named Entity Recognition (NER) is the task of identifying named entities in texts and classifying them through specific semantic categories, a process which is crucial for a wide range of NLP applications. Current datasets for NER focus mainly on coarse-grained entity types, tend to consider a single textual genre and to cover a narrow set of languages, thus limiting the general applicability of NER systems. In this work, we design a new methodology for automatically producing NER annotations, and address the aforementioned limitations by introducing a novel dataset that covers 10 languages, 15 NER categories and 2 textual genres. We also introduce a manually-annotated test set, and extensively evaluate the quality of our novel dataset on both this new test set and standard benchmarks for NER.In addition, in our dataset, we include: i) disambiguation information to enable the development of multilingual entity linking systems, and ii) image URLs to encourage the creation of multimodal systems. We release our dataset at https://github.com/Babelscape/multinerd.

ExtEnD: Extractive Entity Disambiguation
Edoardo Barba | Luigi Procopio | Roberto Navigli
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Local models for Entity Disambiguation (ED) have today become extremely powerful, in most part thanks to the advent of large pre-trained language models. However, despite their significant performance achievements, most of these approaches frame ED through classification formulations that have intrinsic limitations, both computationally and from a modeling perspective. In contrast with this trend, here we propose ExtEnD, a novel local formulation for ED where we frame this task as a text extraction problem, and present two Transformer-based architectures that implement it. Based on experiments in and out of domain, and training over two different data regimes, we find our approach surpasses all its competitors in terms of both data efficiency and raw performance. ExtEnD outperforms its alternatives by as few as 6 F1 points on the more constrained of the two data regimes and, when moving to the other higher-resourced regime, sets a new state of the art on 4 out of 4 benchmarks under consideration, with average improvements of 0.7 F1 points overall and 1.1 F1 points out of domain. In addition, to gain better insights from our results, we also perform a fine-grained evaluation of our performances on different classes of label frequency, along with an ablation study of our architectural choices and an error analysis. We release our code and models for research purposes at https://github.com/SapienzaNLP/extend.

2021

ConSeC: Word Sense Disambiguation as Continuous Sense Comprehension
Edoardo Barba | Luigi Procopio | Roberto Navigli
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Supervised systems have nowadays become the standard recipe for Word Sense Disambiguation (WSD), with Transformer-based language models as their primary ingredient. However, while these systems have certainly attained unprecedented performances, virtually all of them operate under the constraining assumption that, given a context, each word can be disambiguated individually with no account of the other sense choices. To address this limitation and drop this assumption, we propose CONtinuous SEnse Comprehension (ConSeC), a novel approach to WSD: leveraging a recent re-framing of this task as a text extraction problem, we adapt it to our formulation and introduce a feedback loop strategy that allows the disambiguation of a target word to be conditioned not only on its context but also on the explicit senses assigned to nearby words. We evaluate ConSeC and examine how its components lead it to surpass all its competitors and set a new state of the art on English WSD. We also explore how ConSeC fares in the cross-lingual setting, focusing on 8 languages with various degrees of resource availability, and report significant improvements over prior systems. We release our code at https://github.com/SapienzaNLP/consec.

WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER
Simone Tedeschi | Valentino Maiorca | Niccolò Campolungo | Francesco Cecconi | Roberto Navigli
Findings of the Association for Computational Linguistics: EMNLP 2021

Multilingual Named Entity Recognition (NER) is a key intermediate task which is needed in many areas of NLP. In this paper, we address the well-known issue of data scarcity in NER, especially relevant when moving to a multilingual scenario, and go beyond current approaches to the creation of multilingual silver data for the task. We exploit the texts of Wikipedia and introduce a new methodology based on the effective combination of knowledge-based approaches and neural models, together with a novel domain adaptation technique, to produce high-quality training corpora for NER. We evaluate our datasets extensively on standard benchmarks for NER, yielding substantial improvements up to 6 span-based F1-score points over previous state-of-the-art systems for data creation.

SPRING Goes Online: End-to-End AMR Parsing and Generation
Rexhina Blloshmi | Michele Bevilacqua | Edoardo Fabiano | Valentina Caruso | Roberto Navigli
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

In this paper we present SPRING Online Services, a Web interface and RESTful APIs for our state-of-the-art AMR parsing and generation system, SPRING (Symmetric PaRsIng aNd Generation). The Web interface has been developed to be easily used by the Natural Language Processing community, as well as by the general public. It provides, among other things, a highly interactive visualization platform and a feedback mechanism to obtain user suggestions for further improvements of the system’s output. Moreover, our RESTful APIs enable easy integration of SPRING in downstream applications where AMR structures are needed. Finally, we make SPRING Online Services freely available at http://nlp.uniroma1.it/spring and, in addition, we release extra model checkpoints to be used with the original SPRING Python code.

UniteD-SRL: A Unified Dataset for Span- and Dependency-Based Multilingual and Cross-Lingual Semantic Role Labeling
Rocco Tripodi | Simone Conia | Roberto Navigli
Findings of the Association for Computational Linguistics: EMNLP 2021

Multilingual and cross-lingual Semantic Role Labeling (SRL) have recently garnered increasing attention as multilingual text representation techniques have become more effective and widely available. While recent work has attained growing success, results on gold multilingual benchmarks are still not easily comparable across languages, making it difficult to grasp where we stand. For example, in CoNLL-2009, the standard benchmark for multilingual SRL, language-to-language comparisons are affected by the fact that each language has its own dataset which differs from the others in size, domains, sets of labels and annotation guidelines. In this paper, we address this issue and propose UniteD-SRL, a new benchmark for multilingual and cross-lingual, span- and dependency-based SRL. UniteD-SRL provides expert-curated parallel annotations using a common predicate-argument structure inventory, allowing direct comparisons across languages and encouraging studies on cross-lingual transfer in SRL. We release UniteD-SRL v1.0 at https://github.com/SapienzaNLP/united-srl.

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Chengqing Zong | Fei Xia | Wenjie Li | Roberto Navigli
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Integrating Personalized PageRank into Neural Word Sense Disambiguation
Ahmed El Sheikh | Michele Bevilacqua | Roberto Navigli
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Neural Word Sense Disambiguation (WSD) has recently been shown to benefit from the incorporation of pre-existing knowledge, such as that coming from the WordNet graph. However, state-of-the-art approaches have been successful in exploiting only the local structure of the graph, with only close neighbors of a given synset influencing the prediction. In this work, we improve a classification model by recomputing logits as a function of both the vanilla independently produced logits and the global WordNet graph. We achieve this by incorporating an online neural approximated PageRank, which enables us to refine edge weights as well. This method exploits the global graph structure while keeping space requirements linear in the number of edges. We obtain strong improvements, matching the current state of the art. Code is available at https://github.com/SapienzaNLP/neural-pagerank-wsd

IR like a SIR: Sense-enhanced Information Retrieval for Multiple Languages
Rexhina Blloshmi | Tommaso Pasini | Niccolò Campolungo | Somnath Banerjee | Roberto Navigli | Gabriella Pasi
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

With the advent of contextualized embeddings, attention towards neural ranking approaches for Information Retrieval increased considerably. However, two aspects have remained largely neglected: i) queries usually consist of few keywords only, which increases ambiguity and makes their contextualization harder, and ii) performing neural ranking on non-English documents is still cumbersome due to shortage of labeled datasets. In this paper we present SIR (Sense-enhanced Information Retrieval) to mitigate both problems by leveraging word sense information. At the core of our approach lies a novel multilingual query expansion mechanism based on Word Sense Disambiguation that provides sense definitions as additional semantic information for the query. Importantly, we use senses as a bridge across languages, thus allowing our model to perform considerably better than its supervised and unsupervised alternatives across French, German, Italian and Spanish languages on several CLEF benchmarks, while being trained on English Robust04 data only. We release SIR at https://github.com/SapienzaNLP/sir.

SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC)
Federico Martelli | Najla Kalach | Gabriele Tola | Roberto Navigli
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

In this paper, we introduce the first SemEval task on Multilingual and Cross-Lingual Word-in-Context disambiguation (MCL-WiC). This task allows the largely under-investigated inherent ability of systems to discriminate between word senses within and across languages to be evaluated, dropping the requirement of a fixed sense inventory. Framed as a binary classification, our task is divided into two parts. In the multilingual sub-task, participating systems are required to determine whether two target words, each occurring in a different context within the same language, express the same meaning or not. Instead, in the cross-lingual part, systems are asked to perform the task in a cross-lingual scenario, in which the two target words and their corresponding contexts are provided in two different languages. We illustrate our task, as well as the construction of our manually-created dataset including five languages, namely Arabic, Chinese, English, French and Russian, and the results of the participating systems. Datasets and results are available at: https://github.com/SapienzaNLP/mcl-wic.

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources
Simone Conia | Andrea Bacciu | Roberto Navigli
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

While cross-lingual techniques are finding increasing success in a wide range of Natural Language Processing tasks, their application to Semantic Role Labeling (SRL) has been strongly limited by the fact that each language adopts its own linguistic formalism, from PropBank for English to AnCora for Spanish and PDT-Vallex for Czech, inter alia. In this work, we address this issue and present a unified model to perform cross-lingual SRL over heterogeneous linguistic resources. Our model implicitly learns a high-quality mapping for different formalisms across diverse languages without resorting to word alignment and/or translation techniques. We find that, not only is our cross-lingual system competitive with the current state of the art but that it is also robust to low-data scenarios. Most interestingly, our unified model is able to annotate a sentence in a single forward pass with all the inventories it was trained with, providing a tool for the analysis and comparison of linguistic theories across different languages. We release our code and model at https://github.com/SapienzaNLP/unify-srl.

REBEL: Relation Extraction By End-to-end Language generation
Pere-Lluís Huguet Cabot | Roberto Navigli
Findings of the Association for Computational Linguistics: EMNLP 2021

Extracting relation triplets from raw text is a crucial task in Information Extraction, enabling multiple applications such as populating or validating knowledge bases, factchecking, and other downstream tasks. However, it usually involves multiple-step pipelines that propagate errors or are limited to a small number of relation types. To overcome these issues, we propose the use of autoregressive seq2seq models. Such models have previously been shown to perform well not only in language generation, but also in NLU tasks such as Entity Linking, thanks to their framing as seq2seq tasks. In this paper, we show how Relation Extraction can be simplified by expressing triplets as a sequence of text and we present REBEL, a seq2seq model based on BART that performs end-to-end relation extraction for more than 200 different relation types. We show our model’s flexibility by fine-tuning it on an array of Relation Extraction and Relation Classification benchmarks, with it attaining state-of-the-art performance in most of them.

GeneSis: A Generative Approach to Substitutes in Context
Caterina Lacerra | Rocco Tripodi | Roberto Navigli
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The lexical substitution task aims at generating a list of suitable replacements for a target word in context, ideally keeping the meaning of the modified text unchanged. While its usage has increased in recent years, the paucity of annotated data prevents the finetuning of neural models on the task, hindering the full fruition of recently introduced powerful architectures such as language models. Furthermore, lexical substitution is usually evaluated in a framework that is strictly bound to a limited vocabulary, making it impossible to credit appropriate, but out-of-vocabulary, substitutes. To assess these issues, we proposed GeneSis (Generating Substitutes in contexts), the first generative approach to lexical substitution. Thanks to a seq2seq model, we generate substitutes for a word according to the context it appears in, attaining state-of-the-art results on different benchmarks. Moreover, our approach allows silver data to be produced for further improving the performances of lexical substitution systems. Along with an extensive analysis of GeneSis results, we also present a human evaluation of the generated substitutes in order to assess their quality. We release the fine-tuned models, the generated datasets, and the code to reproduce the experiments at https://github.com/SapienzaNLP/genesis.

InVeRo-XL: Making Cross-Lingual Semantic Role Labeling Accessible with Intelligible Verbs and Roles
Simone Conia | Riccardo Orlando | Fabrizio Brignone | Francesco Cecconi | Roberto Navigli
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Notwithstanding the growing interest in cross-lingual techniques for Natural Language Processing, there has been a surprisingly small number of efforts aimed at the development of easy-to-use tools for cross-lingual Semantic Role Labeling. In this paper, we fill this gap and present InVeRo-XL, an off-the-shelf state-of-the-art system capable of annotating text with predicate sense and semantic role labels from 7 predicate-argument structure inventories in more than 40 languages. We hope that our system – with its easy-to-use RESTful API and Web interface – will become a valuable tool for the research community, encouraging the integration of sentence-level semantics into cross-lingual downstream tasks. InVeRo-XL is available online at http://nlp.uniroma1.it/invero.

Named Entity Recognition for Entity Linking: What Works and What’s Next
Simone Tedeschi | Simone Conia | Francesco Cecconi | Roberto Navigli
Findings of the Association for Computational Linguistics: EMNLP 2021

Entity Linking (EL) systems have achieved impressive results on standard benchmarks mainly thanks to the contextualized representations provided by recent pretrained language models. However, such systems still require massive amounts of data – millions of labeled examples – to perform at their best, with training times that often exceed several days, especially when limited computational resources are available. In this paper, we look at how Named Entity Recognition (NER) can be exploited to narrow the gap between EL systems trained on high and low amounts of labeled data. More specifically, we show how and to what extent an EL system can benefit from NER to enhance its entity representations, improve candidate selection, select more effective negative samples and enforce hard and soft constraints on its output entities. We release our software – code and model checkpoints – at https://github.com/Babelscape/ner4el.

SGL: Speaking the Graph Languages of Semantic Parsing via Multilingual Translation
Luigi Procopio | Rocco Tripodi | Roberto Navigli
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Graph-based semantic parsing aims to represent textual meaning through directed graphs. As one of the most promising general-purpose meaning representations, these structures and their parsing have gained a significant interest momentum during recent years, with several diverse formalisms being proposed. Yet, owing to this very heterogeneity, most of the research effort has focused mainly on solutions specific to a given formalism. In this work, instead, we reframe semantic parsing towards multiple formalisms as Multilingual Neural Machine Translation (MNMT), and propose SGL, a many-to-many seq2seq architecture trained with an MNMT objective. Backed by several experiments, we show that this framework is indeed effective once the learning procedure is enhanced with large parallel corpora coming from Machine Translation: we report competitive performances on AMR and UCCA parsing, especially once paired with pre-trained architectures. Furthermore, we find that models trained under this configuration scale remarkably well to tasks such as cross-lingual AMR parsing: SGL outperforms all its competitors by a large margin without even explicitly seeing non-English to AMR examples at training time and, once these examples are included as well, sets an unprecedented state of the art in this task. We release our code and our models for research purposes at https://github.com/SapienzaNLP/sgl.

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
Chengqing Zong | Fei Xia | Wenjie Li | Roberto Navigli
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Chengqing Zong | Fei Xia | Wenjie Li | Roberto Navigli
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Framing Word Sense Disambiguation as a Multi-Label Problem for Model-Agnostic Knowledge Integration
Simone Conia | Roberto Navigli
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Recent studies treat Word Sense Disambiguation (WSD) as a single-label classification problem in which one is asked to choose only the best-fitting sense for a target word, given its context. However, gold data labelled by expert annotators suggest that maximizing the probability of a single sense may not be the most suitable training objective for WSD, especially if the sense inventory of choice is fine-grained. In this paper, we approach WSD as a multi-label classification problem in which multiple senses can be assigned to each target word. Not only does our simple method bear a closer resemblance to how human annotators disambiguate text, but it can also be seamlessly extended to exploit structured knowledge from semantic networks to achieve state-of-the-art results in English all-words WSD.

ESC: Redesigning WSD with Extractive Sense Comprehension
Edoardo Barba | Tommaso Pasini | Roberto Navigli
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Word Sense Disambiguation (WSD) is a historical NLP task aimed at linking words in contexts to discrete sense inventories and it is usually cast as a multi-label classification task. Recently, several neural approaches have employed sense definitions to better represent word meanings. Yet, these approaches do not observe the input sentence and the sense definition candidates all at once, thus potentially reducing the model performance and generalization power. We cope with this issue by reframing WSD as a span extraction problem — which we called Extractive Sense Comprehension (ESC) — and propose ESCHER, a transformer-based neural architecture for this new formulation. By means of an extensive array of experiments, we show that ESC unleashes the full potential of our model, leading it to outdo all of its competitors and to set a new state of the art on the English WSD task. In the few-shot scenario, ESCHER proves to exploit training data efficiently, attaining the same performance as its closest competitor while relying on almost three times fewer annotations. Furthermore, ESCHER can nimbly combine data annotated with senses from different lexical resources, achieving performances that were previously out of everyone’s reach. The model along with data is available at https://github.com/SapienzaNLP/esc.

AMuSE-WSD: An All-in-one Multilingual System for Easy Word Sense Disambiguation
Riccardo Orlando | Simone Conia | Fabrizio Brignone | Francesco Cecconi | Roberto Navigli
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Over the past few years, Word Sense Disambiguation (WSD) has received renewed interest: recently proposed systems have shown the remarkable effectiveness of deep learning techniques in this task, especially when aided by modern pretrained language models. Unfortunately, such systems are still not available as ready-to-use end-to-end packages, making it difficult for researchers to take advantage of their performance. The only alternative for a user interested in applying WSD to downstream tasks is to rely on currently available end-to-end WSD systems, which, however, still rely on graph-based heuristics or non-neural machine learning algorithms. In this paper, we fill this gap and propose AMuSE-WSD, the first end-to-end system to offer high-quality sense information in 40 languages through a state-of-the-art neural model for WSD. We hope that AMuSE-WSD will provide a stepping stone for the integration of meaning into real-world applications and encourage further studies in lexical semantics. AMuSE-WSD is available online at http://nlp.uniroma1.it/amuse-wsd.

2020

Building Semantic Grams of Human Knowledge
Valentina Leone | Giovanni Siragusa | Luigi Di Caro | Roberto Navigli
Proceedings of the Twelfth Language Resources and Evaluation Conference

Word senses are typically defined with textual definitions for human consumption and, in computational lexicons, put in context via lexical-semantic relations such as synonymy, antonymy, hypernymy, etc. In this paper we embrace a radically different paradigm that provides a slot-filler structure, called “semagram”, to define the meaning of words in terms of their prototypical semantic information. We propose a semagram-based knowledge model composed of 26 semantic relationships which integrates features from a range of different sources, such as computational lexicons and property norms. We describe an annotation exercise regarding 50 concepts over 10 different categories and put forward different automated approaches for extending the semagram base to thousands of concepts. We finally evaluated the impact of the proposed resource on a semantic similarity task, showing significant improvements over state-of-the-art word embeddings.

Fatality Killed the Cat or: BabelPic, a Multimodal Dataset for Non-Concrete Concepts
Agostina Calabrese | Michele Bevilacqua | Roberto Navigli
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Thanks to the wealth of high-quality annotated images available in popular repositories such as ImageNet, multimodal language-vision research is in full bloom. However, events, feelings and many other kinds of concepts which can be visually grounded are not well represented in current datasets. Nevertheless, we would expect a wide-coverage language understanding system to be able to classify images depicting recess and remorse, not just cats, dogs and bridges. We fill this gap by presenting BabelPic, a hand-labeled dataset built by cleaning the image-synset association found within the BabelNet Lexical Knowledge Base (LKB). BabelPic explicitly targets non-concrete concepts, thus providing refreshing new data for the community. We also show that pre-trained language-vision systems can be used to further expand the resource by exploiting natural language knowledge available in the LKB. BabelPic is available for download at http://babelpic.org.

Invited Talk: Generationary or: “How We Went beyond Sense Inventories and Learned to Gloss”
Roberto Navigli
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

In this talk I present Generationary, an approach that goes beyond the mainstream assumption that word senses can be represented as discrete items of a predefined inventory, and put forward a unified model which produces contextualized definitions for arbitrary lexical items, from words to phrases and even sentences. Generationary employs a novel span-based encoding scheme to fine-tune an English pre-trained Encoder-Decoder system and generate new definitions. Our model outperforms previous approaches in the generative task of Definition Modeling in many settings, but it also matches or surpasses the state of the art in discriminative tasks such as Word Sense Disambiguation and Word-in-Context. I also show that Generationary benefits from training on definitions from multiple inventories, with strong gains across benchmarks, including a novel dataset of definitions for free adjective-noun phrases, and discuss interesting examples of generated definitions. Joint work with Michele Bevilacqua and Marco Maru.

Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations
Simone Conia | Roberto Navigli
Proceedings of the 28th International Conference on Computational Linguistics

To date, the most successful word, word sense, and concept modelling techniques have used large corpora and knowledge resources to produce dense vector representations that capture semantic similarities in a relatively low-dimensional space. Most current approaches, however, suffer from a monolingual bias, with their strength depending on the amount of data available across languages. In this paper we address this issue and propose Conception, a novel technique for building language-independent vector representations of concepts which places multilinguality at its core while retaining explicit relationships between concepts. Our approach results in high-coverage representations that outperform the state of the art in multilingual and cross-lingual Semantic Word Similarity and Word Sense Disambiguation, proving particularly robust on low-resource languages. Conception – its software and the complete set of representations – is available at https://github.com/SapienzaNLP/conception.

Personalized PageRank with Syntagmatic Information for Multilingual Word Sense Disambiguation
Federico Scozzafava | Marco Maru | Fabrizio Brignone | Giovanni Torrisi | Roberto Navigli
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Exploiting syntagmatic information is an encouraging research focus to be pursued in an effort to close the gap between knowledge-based and supervised Word Sense Disambiguation (WSD) performance. We follow this direction in our next-generation knowledge-based WSD system, SyntagRank, which we make available via a Web interface and a RESTful API. SyntagRank leverages the disambiguated pairs of co-occurring words included in SyntagNet, a lexical-semantic combination resource, to perform state-of-the-art knowledge-based WSD in a multilingual setting. Our service provides both a user-friendly interface, available at http://syntagnet.org/, and a RESTful endpoint to query the system programmatically (accessible at http://api.syntagnet.org/).

InVeRo: Making Semantic Role Labeling Accessible with Intelligible Verbs and Roles
Simone Conia | Fabrizio Brignone | Davide Zanfardino | Roberto Navigli
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Semantic Role Labeling (SRL) is deeply dependent on complex linguistic resources and sophisticated neural models, which makes the task difficult to approach for non-experts. To address this issue we present a new platform named Intelligible Verbs and Roles (InVeRo). This platform provides access to a new verb resource, VerbAtlas, and a state-of-the-art pretrained implementation of a neural, span-based architecture for SRL. Both the resource and the system provide human-readable verb sense and semantic role information, with an easy to use Web interface and RESTful APIs available at http://nlp.uniroma1.it/invero.

Sense-Annotated Corpora for Word Sense Disambiguation in Multiple Languages and Domains
Bianca Scarlini | Tommaso Pasini | Roberto Navigli
Proceedings of the Twelfth Language Resources and Evaluation Conference

The knowledge acquisition bottleneck problem dramatically hampers the creation of sense-annotated data for Word Sense Disambiguation (WSD). Sense-annotated data are scarce for English and almost absent for other languages. This limits the range of action of deep-learning approaches, which today are at the base of any NLP task and are hungry for data. We mitigate this issue and encourage further research in multilingual WSD by releasing to the NLP community five large datasets annotated with word-senses in five different languages, namely, English, French, Italian, German and Spanish, and 5 distinct datasets in English, each for a different semantic domain. We show that supervised WSD models trained on our data attain higher performance than when trained on other automatically-created corpora. We release all our data containing more than 15 million annotated instances in 5 different languages at http://trainomatic.org/onesec.

Breaking Through the 80% Glass Ceiling: Raising the State of the Art in Word Sense Disambiguation by Incorporating Knowledge Graph Information
Michele Bevilacqua | Roberto Navigli
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Neural architectures are the current state of the art in Word Sense Disambiguation (WSD). However, they make limited use of the vast amount of relational information encoded in Lexical Knowledge Bases (LKB). We present Enhanced WSD Integrating Synset Embeddings and Relations (EWISER), a neural supervised architecture that is able to tap into this wealth of knowledge by embedding information from the LKB graph within the neural architecture, and to exploit pretrained synset embeddings, enabling the network to predict synsets that are not in the training set. As a result, we set a new state of the art on almost all the evaluation settings considered, also breaking through, for the first time, the 80% ceiling on the concatenation of all the standard all-words English WSD evaluation benchmarks. On multilingual all-words WSD, we report state-of-the-art results by training on nothing but English.

XL-AMR: Enabling Cross-Lingual AMR Parsing with Transfer Learning Techniques
Rexhina Blloshmi | Rocco Tripodi | Roberto Navigli
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Abstract Meaning Representation (AMR) is a popular formalism of natural language that represents the meaning of a sentence as a semantic graph. It is agnostic about how to derive meanings from strings and for this reason it lends itself well to the encoding of semantics across languages. However, cross-lingual AMR parsing is a hard task, because training data are scarce in languages other than English and the existing English AMR parsers are not directly suited to being used in a cross-lingual setting. In this work we tackle these two problems so as to enable cross-lingual AMR parsing: we explore different transfer learning techniques for producing automatic AMR annotations across languages and develop a cross-lingual AMR parser, XL-AMR. This can be trained on the produced data and does not rely on AMR aligners or source-copy mechanisms as is commonly the case in English AMR parsing. The results of XL-AMR significantly surpass those previously reported in Chinese, German, Italian and Spanish. Finally we provide a qualitative analysis which sheds light on the suitability of AMR across languages. We release XL-AMR at github.com/SapienzaNLP/xl-amr.

Generationary or “How We Went beyond Word Sense Inventories and Learned to Gloss”
Michele Bevilacqua | Marco Maru | Roberto Navigli
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Mainstream computational lexical semantics embraces the assumption that word senses can be represented as discrete items of a predefined inventory. In this paper we show this needs not be the case, and propose a unified model that is able to produce contextually appropriate definitions. In our model, Generationary, we employ a novel span-based encoding scheme which we use to fine-tune an English pre-trained Encoder-Decoder system to generate glosses. We show that, even though we drop the need of choosing from a predefined sense inventory, our model can be employed effectively: not only does Generationary outperform previous approaches in the generative task of Definition Modeling in many settings, but it also matches or surpasses the state of the art in discriminative tasks such as Word Sense Disambiguation and Word-in-Context. Finally, we show that Generationary benefits from training on data from multiple inventories, with strong gains on various zero-shot benchmarks, including a novel dataset of definitions for free adjective-noun phrases. The software and reproduction materials are available at http://generationary.org.

Bridging the Gap in Multilingual Semantic Role Labeling: a Language-Agnostic Approach
Simone Conia | Roberto Navigli
Proceedings of the 28th International Conference on Computational Linguistics

Recent research indicates that taking advantage of complex syntactic features leads to favorable results in Semantic Role Labeling. Nonetheless, an analysis of the latest state-of-the-art multilingual systems reveals the difficulty of bridging the wide gap in performance between high-resource (e.g., English) and low-resource (e.g., German) settings. To overcome this issue, we propose a fully language-agnostic model that does away with morphological and syntactic features to achieve robustness across languages. Our approach outperforms the state of the art in all the languages of the CoNLL-2009 benchmark dataset, especially whenever a scarce amount of training data is available. Our objective is not to reject approaches that rely on syntax, rather to set a strong and consistent language-independent baseline for future innovations in Semantic Role Labeling. We release our model code and checkpoints at https://github.com/SapienzaNLP/multi-srl.

With More Contexts Comes Better Performance: Contextualized Sense Embeddings for All-Round Word Sense Disambiguation
Bianca Scarlini | Tommaso Pasini | Roberto Navigli
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Contextualized word embeddings have been employed effectively across several tasks in Natural Language Processing, as they have proved to carry useful semantic information. However, it is still hard to link them to structured sources of knowledge. In this paper we present ARES (context-AwaRe Embeddings of Senses), a semi-supervised approach to producing sense embeddings for the lexical meanings within a lexical knowledge base that lie in a space that is comparable to that of contextualized word vectors. ARES representations enable a simple 1 Nearest-Neighbour algorithm to outperform state-of-the-art models, not only in the English Word Sense Disambiguation task, but also in the multilingual one, whilst training on sense-annotated data in English only. We further assess the quality of our embeddings in the Word-in-Context task, where, when used as an external source of knowledge, they consistently improve the performance of a neural model, leading it to compete with other more complex architectures. ARES embeddings for all WordNet concepts and the automatically-extracted contexts used for creating the sense representations are freely available at http://sensembert.org/ares.

2019

Quasi Bidirectional Encoder Representations from Transformers for Word Sense Disambiguation
Michele Bevilacqua | Roberto Navigli
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

While contextualized embeddings have produced performance breakthroughs in many Natural Language Processing (NLP) tasks, Word Sense Disambiguation (WSD) has not benefited from them yet. In this paper, we introduce QBERT, a Transformer-based architecture for contextualized embeddings which makes use of a co-attentive layer to produce more deeply bidirectional representations, better-fitting for the WSD task. As a result, we are able to train a WSD system that beats the state of the art on the concatenation of all evaluation datasets by over 3 points, also outperforming a comparable model using ELMo.

SyntagNet: Challenging Supervised Word Sense Disambiguation with Lexical-Semantic Combinations
Marco Maru | Federico Scozzafava | Federico Martelli | Roberto Navigli
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Current research in knowledge-based Word Sense Disambiguation (WSD) indicates that performances depend heavily on the Lexical Knowledge Base (LKB) employed. This paper introduces SyntagNet, a novel resource consisting of manually disambiguated lexical-semantic combinations. By capturing sense distinctions evoked by syntagmatic relations, SyntagNet enables knowledge-based WSD systems to establish a new state of the art which challenges the hitherto unrivaled performances attained by supervised approaches. To the best of our knowledge, SyntagNet is the first large-scale manually-curated resource of this kind made available to the community (at http://syntagnet.org).

Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)
Raffaella Bernardi | Roberto Navigli | Giovanni Semeraro
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

Preface
Raffaella Bernardi | Roberto Navigli | Giovanni Semeraro
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

LSTMEmbed: Learning Word and Sense Representations from a Large Semantically Annotated Corpus with Long Short-Term Memories
Ignacio Iacobacci | Roberto Navigli
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

While word embeddings are now a de facto standard representation of words in most NLP tasks, recently the attention has been shifting towards vector representations which capture the different meanings, i.e., senses, of words. In this paper we explore the capabilities of a bidirectional LSTM model to learn representations of word senses from semantically annotated corpora. We show that the utilization of an architecture that is aware of word order, like an LSTM, enables us to create better representations. We assess our proposed model on various standard benchmarks for evaluating semantic representations, reaching state-of-the-art performance on the SemEval-2014 word-to-sense similarity task. We release the code and the resulting word and sense embeddings at http://lcl.uniroma1.it/LSTMEmbed.

VerbAtlas: a Novel Large-Scale Verbal Semantic Resource and Its Application to Semantic Role Labeling
Andrea Di Fabio | Simone Conia | Roberto Navigli
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We present VerbAtlas, a new, hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from WordNet into semantically-coherent frames. The frames define a common, prototypical argument structure while at the same time providing new concept-specific information. In contrast to PropBank, which defines enumerative semantic roles, VerbAtlas comes with an explicit, cross-frame set of semantic roles linked to selectional preferences expressed in terms of WordNet synsets, and is the first resource enriched with semantic information about implicit, shadow, and default arguments. We demonstrate the effectiveness of VerbAtlas in the task of dependency-based Semantic Role Labeling and show how its integration into a high-performance system leads to improvements on both the in-domain and out-of-domain test sets of CoNLL-2009. VerbAtlas is available at http://verbatlas.org.

Game Theory Meets Embeddings: a Unified Framework for Word Sense Disambiguation
Rocco Tripodi | Roberto Navigli
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Game-theoretic models, thanks to their intrinsic ability to exploit contextual information, have shown to be particularly suited for the Word Sense Disambiguation task. They represent ambiguous words as the players of a non cooperative game and their senses as the strategies that the players can select in order to play the games. The interaction among the players is modeled with a weighted graph and the payoff as an embedding similarity function, that the players try to maximize. The impact of the word and sense embedding representations in the framework has been tested and analyzed extensively: experiments on standard benchmarks show state-of-art performances and different tests hint at the usefulness of using disambiguation to obtain contextualized word representations.

Just “OneSeC” for Producing Multilingual Sense-Annotated Data
Bianca Scarlini | Tommaso Pasini | Roberto Navigli
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The well-known problem of knowledge acquisition is one of the biggest issues in Word Sense Disambiguation (WSD), where annotated data are still scarce in English and almost absent in other languages. In this paper we formulate the assumption of One Sense per Wikipedia Category and present OneSeC, a language-independent method for the automatic extraction of hundreds of thousands of sentences in which a target word is tagged with its meaning. Our automatically-generated data consistently lead a supervised WSD model to state-of-the-art performance when compared with other automatic and semi-automatic methods. Moreover, our approach outperforms its competitors on multilingual and domain-specific settings, where it beats the existing state of the art on all languages and most domains. All the training data are available for research purposes at http://trainomatic.org/onesec.

2018

Huge Automatically Extracted Training-Sets for Multilingual Word SenseDisambiguation
Tommaso Pasini | Francesco Elia | Roberto Navigli
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

SemEval-2018 Task 9: Hypernym Discovery
Jose Camacho-Collados | Claudio Delli Bovi | Luis Espinosa-Anke | Sergio Oramas | Tommaso Pasini | Enrico Santus | Vered Shwartz | Roberto Navigli | Horacio Saggion
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes the SemEval 2018 Shared Task on Hypernym Discovery. We put forward this task as a complementary benchmark for modeling hypernymy, a problem which has traditionally been cast as a binary classification task, taking a pair of candidate words as input. Instead, our reformulated task is defined as follows: given an input term, retrieve (or discover) its suitable hypernyms from a target corpus. We proposed five different subtasks covering three languages (English, Spanish, and Italian), and two specific domains of knowledge in English (Medical and Music). Participants were allowed to compete in any or all of the subtasks. Overall, a total of 11 teams participated, with a total of 39 different systems submitted through all subtasks. Data, results and further information about the task can be found at https://competitions.codalab.org/competitions/17119.

2017

Neural Sequence Learning Models for Word Sense Disambiguation
Alessandro Raganato | Claudio Delli Bovi | Roberto Navigli
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Word Sense Disambiguation models exist in many flavors. Even though supervised ones tend to perform best in terms of accuracy, they often lose ground to more flexible knowledge-based solutions, which do not require training by a word expert for every disambiguation target. To bridge this gap we adopt a different perspective and rely on sequence learning to frame the disambiguation problem: we propose and study in depth a series of end-to-end neural architectures directly tailored to the task, from bidirectional Long Short-Term Memory to encoder-decoder models. Our extensive evaluation over standard benchmarks and in multiple languages shows that sequence learning enables more versatile all-words models that consistently lead to state-of-the-art results, even against word experts with engineered features.

Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison
Alessandro Raganato | Jose Camacho-Collados | Roberto Navigli
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Word Sense Disambiguation is a long-standing task in Natural Language Processing, lying at the core of human language understanding. However, the evaluation of automatic systems has been problematic, mainly due to the lack of a reliable evaluation framework. In this paper we develop a unified evaluation framework and analyze the performance of various Word Sense Disambiguation systems in a fair setup. The results show that supervised systems clearly outperform knowledge-based models. Among the supervised systems, a linear classifier trained on conventional local features still proves to be a hard baseline to beat. Nonetheless, recent approaches exploiting neural networks on unlabeled corpora achieve promising results, surpassing this hard baseline in most test sets.

BabelDomains: Large-Scale Domain Labeling of Lexical Resources
Jose Camacho-Collados | Roberto Navigli
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

In this paper we present BabelDomains, a unified resource which provides lexical items with information about domains of knowledge. We propose an automatic method that uses knowledge from various lexical resources, exploiting both distributional and graph-based clues, to accurately propagate domain information. We evaluate our methodology intrinsically on two lexical resources (WordNet and BabelNet), achieving a precision over 80% in both cases. Finally, we show the potential of BabelDomains in a supervised learning setting, clustering training data by domain for hypernym discovery.

Towards a Seamless Integration of Word Senses into Downstream NLP Applications
Mohammad Taher Pilehvar | Jose Camacho-Collados | Roberto Navigli | Nigel Collier
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Lexical ambiguity can impede NLP systems from accurate understanding of semantics. Despite its potential benefits, the integration of sense-level information into NLP systems has remained understudied. By incorporating a novel disambiguation algorithm into a state-of-the-art classification model, we create a pipeline to integrate sense-level information into downstream NLP applications. We show that a simple disambiguation of the input text can lead to consistent performance improvement on multiple topic categorization and polarity detection datasets, particularly when the fine granularity of the underlying sense inventory is reduced and the document is sufficiently large. Our results also point to the need for sense representation research to focus more on in vivo evaluations which target the performance in downstream NLP applications rather than artificial benchmarks.

EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text
Claudio Delli Bovi | Jose Camacho-Collados | Alessandro Raganato | Roberto Navigli
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Parallel corpora are widely used in a variety of Natural Language Processing tasks, from Machine Translation to cross-lingual Word Sense Disambiguation, where parallel sentences can be exploited to automatically generate high-quality sense annotations on a large scale. In this paper we present EuroSense, a multilingual sense-annotated resource based on the joint disambiguation of the Europarl parallel corpus, with almost 123 million sense annotations for over 155 thousand distinct concepts and entities from a language-independent unified sense inventory. We evaluate the quality of our sense annotations intrinsically and extrinsically, showing their effectiveness as training data for Word Sense Disambiguation.

Embedding Words and Senses Together via Joint Knowledge-Enhanced Training
Massimiliano Mancini | Jose Camacho-Collados | Ignacio Iacobacci | Roberto Navigli
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Word embeddings are widely used in Natural Language Processing, mainly due to their success in capturing semantic information from massive corpora. However, their creation process does not allow the different meanings of a word to be automatically separated, as it conflates them into a single vector. We address this issue by proposing a new model which learns word and sense embeddings jointly. Our model exploits large corpora and knowledge from semantic networks in order to produce a unified vector space of word and sense embeddings. We evaluate the main features of our approach both qualitatively and quantitatively in a variety of tasks, highlighting the advantages of the proposed method in comparison to state-of-the-art word- and sense-based models.

Train-O-Matic: Large-Scale Supervised Word Sense Disambiguation in Multiple Languages without Manual Training Data
Tommaso Pasini | Roberto Navigli
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Annotating large numbers of sentences with senses is the heaviest requirement of current Word Sense Disambiguation. We present Train-O-Matic, a language-independent method for generating millions of sense-annotated training instances for virtually all meanings of words in a language’s vocabulary. The approach is fully automatic: no human intervention is required and the only type of human knowledge used is a WordNet-like resource. Train-O-Matic achieves consistently state-of-the-art performance across gold standard datasets and languages, while at the same time removing the burden of manual annotation. All the training data is available for research purposes at http://trainomatic.org.

SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity
Jose Camacho-Collados | Mohammad Taher Pilehvar | Nigel Collier | Roberto Navigli
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper introduces a new task on Multilingual and Cross-lingual SemanticThis paper introduces a new task on Multilingual and Cross-lingual Semantic Word Similarity which measures the semantic similarity of word pairs within and across five languages: English, Farsi, German, Italian and Spanish. High quality datasets were manually curated for the five languages with high inter-annotator agreements (consistently in the 0.9 ballpark). These were used for semi-automatic construction of ten cross-lingual datasets. 17 teams participated in the task, submitting 24 systems in subtask 1 and 14 systems in subtask 2. Results show that systems that combine statistical knowledge from text corpora, in the form of word embeddings, and external knowledge from lexical resources are best performers in both subtasks. More information can be found on the task website: http://alt.qcri.org/semeval2017/task2/

2016

Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations
José Camacho-Collados | Roberto Navigli
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP

A Large-Scale Multilingual Disambiguation of Glosses
José Camacho-Collados | Claudio Delli Bovi | Alessandro Raganato | Roberto Navigli
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Linking concepts and named entities to knowledge bases has become a crucial Natural Language Understanding task. In this respect, recent works have shown the key advantage of exploiting textual definitions in various Natural Language Processing applications. However, to date there are no reliable large-scale corpora of sense-annotated textual definitions available to the research community. In this paper we present a large-scale high-quality corpus of disambiguated glosses in multiple languages, comprising sense annotations of both concepts and named entities from a unified sense inventory. Our approach for the construction and disambiguation of the corpus builds upon the structure of a large multilingual semantic network and a state-of-the-art disambiguation system; first, we gather complementary information of equivalent definitions across different languages to provide context for disambiguation, and then we combine it with a semantic similarity-based refinement. As a result we obtain a multilingual corpus of textual definitions featuring over 38 million definitions in 263 languages, and we make it freely available at http://lcl.uniroma1.it/disambiguated-glosses. Experiments on Open Information Extraction and Sense Clustering show how two state-of-the-art approaches improve their performance by integrating our disambiguated corpus into their pipeline.

Semantic Representations of Word Senses and Concepts
José Camacho-Collados | Ignacio Iacobacci | Roberto Navigli | Mohammad Taher Pilehvar
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

Representing the semantics of linguistic items in a machine interpretable form has been a major goal of Natural Language Processing since its earliest days. Among the range of different linguistic items, words have attracted the most research attention. However, word representations have an important limitation: they conflate different meanings of a word into a single vector. Representations of word senses have the potential to overcome this inherent limitation. Indeed, the representation of individual word senses and concepts has recently gained in popularity with several experimental results showing that a considerable performance improvement can be achieved across different NLP applications upon moving from word level to the deeper sense and concept levels. Another interesting point regarding the representation of concepts and word senses is that these models can be seamlessly applied to other linguistic items, such as words, phrases, sentences, etc.This tutorial will first provide a brief overview of the recent literature concerning word representation (both count based and neural network based). It will then describe the advantages of moving from the word level to the deeper level of word senses and concepts, providing an extensive review of state of the art systems. Approaches covered will not only include those which draw upon knowledge resources such as WordNet, Wikipedia, BabelNet or FreeBase as reference, but also the so called multi prototype approaches which learn sense distinctions by using different clustering techniques. Our tutorial will discuss the advantages and potential limitations of all approaches, showing their most successful applications to date. We will conclude by presenting current open problems and lines of future work.

Embeddings for Word Sense Disambiguation: An Evaluation Study
Ignacio Iacobacci | Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

Proceedings of the 1st Workshop on Semantics-Driven Statistical Machine Translation (S2MT 2015)
Deyi Xiong | Kevin Duh | Christian Hardmeier | Roberto Navigli
Proceedings of the 1st Workshop on Semantics-Driven Statistical Machine Translation (S2MT 2015)

SensEmbed: Learning Sense Embeddings for Word and Relational Similarity
Ignacio Iacobacci | Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval)
Georgeta Bordea | Paul Buitelaar | Stefano Faralli | Roberto Navigli
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

An Open-source Framework for Multi-level Semantic Similarity Measurement
Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

Large-Scale Information Extraction from Textual Definitions through Deep Syntactic and Semantic Analysis
Claudio Delli Bovi | Luca Telesca | Roberto Navigli
Transactions of the Association for Computational Linguistics, Volume 3

We present DefIE, an approach to large-scale Information Extraction (IE) based on a syntactic-semantic analysis of textual definitions. Given a large corpus of definitions we leverage syntactic dependencies to reduce data sparsity, then disambiguate the arguments and content words of the relation strings, and finally exploit the resulting information to organize the acquired relations hierarchically. The output of DefIE is a high-quality knowledge base consisting of several million automatically acquired semantic relations.

A Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets
José Camacho-Collados | Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Knowledge Base Unification via Sense Embeddings and Disambiguation
Claudio Delli Bovi | Luis Espinosa-Anke | Roberto Navigli
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Reconciling Heterogeneous Descriptions of Language Resources
John P. McCrae | Philipp Cimiano | Victor Rodríguez Doncel | Daniel Vila-Suero | Jorge Gracia | Luca Matteis | Roberto Navigli | Andrejs Abele | Gabriela Vulcu | Paul Buitelaar
Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications

Multilinguality at Your Fingertips : BabelNet, Babelfy and Beyond !
Roberto Navigli
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Conférences invitées

Multilinguality is a key feature of today’s Web, and it is this feature that we leverage and exploit in our research work at the Sapienza University of Rome’s Linguistic Computing Laboratory, which I am going to overview and showcase in this talk. I will start by presenting BabelNet 3.0, available at http://babelnet.org, a very large multilingual encyclopedic dictionary and semantic network, which covers 271 languages and provides both lexicographic and encyclopedic knowledge for all the open-class parts of speech, thanks to the seamless integration of WordNet, Wikipedia, Wiktionary, OmegaWiki, Wikidata and the Open Multilingual WordNet. Next, I will present Babelfy, available at http://babelfy.org, a unified approach that leverages BabelNet to jointly perform word sense disambiguation and entity linking in arbitrary languages, with performance on both tasks on a par with, or surpassing, those of task-specific state-of-the-art supervised systems. Finally I will describe the Wikipedia Bitaxonomy, available at http://wibitaxonomy.org, a new approach to the construction of a Wikipedia bitaxonomy, that is, the largest and most accurate currently available taxonomy of Wikipedia pages and taxonomy of categories, aligned to each other. I will also give an outline of future work on multilingual resources and processing, including state-of-the-art semantic similarity with sense embeddings.

SemEval-2015 Task 13: Multilingual All-Words Sense Disambiguation and Entity Linking
Andrea Moro | Roberto Navigli
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

NASARI: a Novel Approach to a Semantically-Aware Representation of Items
José Camacho-Collados | Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Reading Between the Lines: Overcoming Data Sparsity for Accurate Classification of Lexical Relationships
Silvia Necşulescu | Sara Mendes | David Jurgens | Núria Bel | Roberto Navigli
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

A Unified Multilingual Semantic Representation of Concepts
José Camacho-Collados | Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

Entity Linking meets Word Sense Disambiguation: a Unified Approach
Andrea Moro | Alessandro Raganato | Roberto Navigli
Transactions of the Association for Computational Linguistics, Volume 2

Entity Linking (EL) and Word Sense Disambiguation (WSD) both address the lexical ambiguity of language. But while the two tasks are pretty similar, they differ in a fundamental respect: in EL the textual mention can be linked to a named entity which may or may not contain the exact mention, while in WSD there is a perfect match between the word form (better, its lemma) and a suitable word sense. In this paper we present Babelfy, a unified graph-based approach to EL and WSD based on a loose identification of candidate meanings coupled with a densest subgraph heuristic which selects high-coherence semantic interpretations. Our experiments show state-of-the-art performances on both tasks on 6 different datasets, including a multilingual setting. Babelfy is online at http://babelfy.org

A Knowledge-based Representation for Cross-Language Document Retrieval and Categorization
Marc Franco-Salvador | Paolo Rosso | Roberto Navigli
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)
Johan Bos | Anette Frank | Roberto Navigli
Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)

A Robust Approach to Aligning Heterogeneous Lexical Resources
Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project
Tiziano Flati | Daniele Vannella | Tommaso Pasini | Roberto Navigli
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

It’s All Fun and Games until Someone Annotates: Video Games with a Purpose for Linguistic Annotation
David Jurgens | Roberto Navigli
Transactions of the Association for Computational Linguistics, Volume 2

Annotated data is prerequisite for many NLP applications. Acquiring large-scale annotated corpora is a major bottleneck, requiring significant time and resources. Recent work has proposed turning annotation into a game to increase its appeal and lower its cost; however, current games are largely text-based and closely resemble traditional annotation tasks. We propose a new linguistic annotation paradigm that produces annotations from playing graphical video games. The effectiveness of this design is demonstrated using two video games: one to create a mapping from WordNet senses to images, and a second game that performs Word Sense Disambiguation. Both games produce accurate results. The first game yields annotation quality equal to that of experts and a cost reduction of 73% over equivalent crowdsourcing; the second game provides a 16.3% improvement in accuracy over current state-of-the-art sense disambiguation games with WordNet.

WoSIT: A Word Sense Induction Toolkit for Search Result Clustering and Diversification
Daniele Vannella | Tiziano Flati | Roberto Navigli
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Validating and Extending Semantic Knowledge Bases using Video Games with a Purpose
Daniele Vannella | David Jurgens | Daniele Scarfini | Domenico Toscani | Roberto Navigli
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A Large-Scale Pseudoword-Based Evaluation Framework for State-of-the-Art Word Sense Disambiguation
Mohammad Taher Pilehvar | Roberto Navigli
Computational Linguistics, Volume 40, Issue 4 - December 2014

(Digital) Goodies from the ERC Wishing Well: BabelNet, Babelfy, Video Games with a Purpose and the Wikipedia Bitaxonomy
Roberto Navigli
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)

Annotating the MASC Corpus with BabelNet
Andrea Moro | Roberto Navigli | Francesco Maria Tucci | Rebecca J. Passonneau
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we tackle the problem of automatically annotating, with both word senses and named entities, the MASC 3.0 corpus, a large English corpus covering a wide range of genres of written and spoken text. We use BabelNet 2.0, a multilingual semantic network which integrates both lexicographic and encyclopedic knowledge, as our sense/entity inventory together with its semantic structure, to perform the aforementioned annotation task. Word sense annotated corpora have been around for more than twenty years, helping the development of Word Sense Disambiguation algorithms by providing both training and testing grounds. More recently Entity Linking has followed the same path, with the creation of huge resources containing annotated named entities. However, to date, there has been no resource that contains both kinds of annotation. In this paper we present an automatic approach for performing this annotation, together with its output on the MASC corpus. We use this corpus because its goal of integrating different types of annotations goes exactly in our same direction. Our overall aim is to stimulate research on the joint exploitation and disambiguation of word senses and named entities. Finally, we estimate the quality of our annotations using both manually-tagged named entities and word senses, obtaining an accuracy of roughly 70% for both named entities and word sense annotations.

Representing Multilingual Data as Linked Data: the Case of BabelNet 2.0
Maud Ehrmann | Francesco Cecconi | Daniele Vannella | John McCrae | Philipp Cimiano | Roberto Navigli
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Recent years have witnessed a surge in the amount of semantic information published on the Web. Indeed, the Web of Data, a subset of the Semantic Web, has been increasing steadily in both volume and variety, transforming the Web into a ‘global database’ in which resources are linked across sites. Linguistic fields – in a broad sense – have not been left behind, and we observe a similar trend with the growth of linguistic data collections on the so-called ‘Linguistic Linked Open Data (LLOD) cloud’. While both Semantic Web and Natural Language Processing communities can obviously take advantage of this growing and distributed linguistic knowledge base, they are today faced with a new challenge, i.e., that of facilitating multilingual access to the Web of data. In this paper we present the publication of BabelNet 2.0, a wide-coverage multilingual encyclopedic dictionary and ontology, as Linked Data. The conversion made use of lemon, a lexicon model for ontologies particularly well-suited for this enterprise. The result is an interlinked multilingual (lexical) resource which can not only be accessed on the LOD, but also be used to enrich existing datasets with linguistic information, or to support the process of mapping datasets across languages.

Multilingual Word Sense Disambiguation and Entity Linking
Roberto Navigli | Andrea Moro
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Tutorial Abstracts

SemEval-2014 Task 3: Cross-Level Semantic Similarity
David Jurgens | Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity
Mohammad Taher Pilehvar | David Jurgens | Roberto Navigli
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Growing Multi-Domain Glossaries from a Few Seeds using Probabilistic Topic Models
Stefano Faralli | Roberto Navigli
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

Paving the Way to a Large-scale Pseudosense-annotated Dataset
Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction
Paola Velardi | Stefano Faralli | Roberto Navigli
Computational Linguistics, Volume 39, Issue 3 - September 2013

Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction
Antonio Di Marco | Roberto Navigli
Computational Linguistics, Volume 39, Issue 3 - September 2013

GlossBoot: Bootstrapping Multilingual Domain Glossaries from the Web
Flavio De Benedictis | Stefano Faralli | Roberto Navigli
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

SPred: Large-scale Harvesting of Semantic Predicates
Tiziano Flati | Roberto Navigli
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A Java Framework for Multilingual Definition and Hypernym Extraction
Stefano Faralli | Roberto Navigli
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

SemEval-2013 Task 12: Multilingual Word Sense Disambiguation
Roberto Navigli | David Jurgens | Daniele Vannella
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

SemEval-2013 Task 11: Word Sense Induction and Disambiguation within an End-User Application
Roberto Navigli | Daniele Vannella
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

Multilingual WSD with Just a Few Lines of Code: the BabelNet API
Roberto Navigli | Simone Paolo Ponzetto
Proceedings of the ACL 2012 System Demonstrations

Joining Forces Pays Off: Multilingual Joint Word Sense Disambiguation
Roberto Navigli | Simone Paolo Ponzetto
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

A New Method for Evaluating Automatically Learned Terminological Taxonomies
Paola Velardi | Roberto Navigli | Stefano Faralli | Juana Maria Ruiz Martinez
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Evaluating a taxonomy learned automatically against an existing gold standard is a very complex problem, because differences stem from the number, label, depth and ordering of the taxonomy nodes. In this paper we propose casting the problem as one of comparing two hierarchical clusters. To this end we defined a variation of the Fowlkes and Mallows measure (Fowlkes and Mallows, 1983). Our method assigns a similarity value B^i_(l,r) to the learned (l) and reference (r) taxonomy for each cut i of the corresponding anonymised hierarchies, starting from the topmost nodes down to the leaf concepts. For each cut i, the two hierarchies can be seen as two clusterings C^i_l , C^i_r of the leaf concepts. We assign a prize to early similarity values, i.e. when concepts are clustered in a similar way down to the lowest taxonomy levels (close to the leaf nodes). We apply our method to the evaluation of the taxonomy learning methods put forward by Navigli et al. (2011) and Kozareva and Hovy (2010).

A New Minimally-Supervised Framework for Domain Word Sense Disambiguation
Stefano Faralli | Roberto Navigli
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2010

An Annotated Dataset for Extracting Definitions and Hypernyms from the Web
Roberto Navigli | Paola Velardi | Juana Maria Ruiz-Martínez
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents and analyzes an annotated corpus of definitions, created to train an algorithm for the automatic extraction of definitions and hypernyms from web documents. As an additional resource, we also include a corpus of non-definitions with syntactic patterns similar to those of definition sentences, e.g.: ""An android is a robot"" vs. ""Snowcap is unmistakable"". Domain and style independence is obtained thanks to the annotation of a large and domain-balanced corpus and to a novel pattern generalization algorithm based on word-class lattices (WCL). A lattice is a directed acyclic graph (DAG), a subclass of nondeterministic finite state automata (NFA). The lattice structure has the purpose of preserving the salient differences among distinct sequences, while eliminating redundant information. The WCL algorithm will be integrated into an improved version of the GlossExtractor Web application (Velardi et al., 2008). This paper is mostly concerned with a description of the corpus, the annotation strategy, and a linguistic analysis of the data. A summary of the WCL algorithm is also provided for the sake of completeness.

Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems
Simone Paolo Ponzetto | Roberto Navigli
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Inducing Word Senses to Improve Web Search Result Clustering
Roberto Navigli | Giuseppe Crisafulli
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Learning Word-Class Lattices for Definition and Hypernym Extraction
Roberto Navigli | Paola Velardi
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

BabelNet: Building a Very Large Multilingual Semantic Network
Roberto Navigli | Simone Paolo Ponzetto
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

Using Cycles and Quasi-Cycles to Disambiguate Dictionary Glosses
Roberto Navigli
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2007

SemEval-2007 Task 10: English Lexical Substitution Task
Diana McCarthy | Roberto Navigli
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

SemEval-2007 Task 07: Coarse-Grained English All-Words Task
Roberto Navigli | Kenneth C. Litkowski | Orin Hargraves
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

Enriching a Formal Ontology with a Thesaurus: an Application in the Cultural Heritage Domain
Roberto Navigli | Paola Velardi
Proceedings of the 2nd Workshop on Ontology Learning and Population: Bridging the Gap between Text and Knowledge

Valido: A Visual Tool for Validating Sense Annotations
Roberto Navigli
Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions

Online Word Sense Disambiguation with Structural Semantic Interconnections
Roberto Navigli
Demonstrations

Reducing the Granularity of a Computational Lexicon via an Automatic Mapping to a Coarse-Grained Sense Inventory
Roberto Navigli
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

WordNet is the reference sense inventory of most of the current Word Sense Disambiguation systems. Unfortunately, it encodes too fine-grained distinctions, making it difficult even for humans to solve the ambiguity of words in context. In this paper, we present a method for reducing the granularity of the WordNet sense inventory based on the mapping to a manually crafted dictionary encoding sense groups, namely the Oxford Dictionary of English. We assess the quality of the mapping and discuss the potential of the method.

Ensemble Methods for Unsupervised WSD
Samuel Brody | Roberto Navigli | Mirella Lapata
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance
Roberto Navigli
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

Squibs: Consistent Validation of Manual and Automatic Sense Annotations with the Aid of Semantic Graphs
Roberto Navigli
Computational Linguistics, Volume 32, Number 2, June 2006

Experiments on the Validation of Sense Annotations Assisted by Lexical Chains
Roberto Navigli
11th Conference of the European Chapter of the Association for Computational Linguistics

2004

Quantitative and Qualitative Evaluation of the OntoLearn Ontology Learning System
Roberto Navigli | Paola Velardi | Alessandro Cucchiarelli | Francesca Neri
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

Structural semantic interconnection: a knowledge-based approach to Word Sense Disambiguation
Roberto Navigli | Paola Velardi
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites
Roberto Navigli | Paola Velardi
Computational Linguistics, Volume 30, Number 2, June 2004

Automatic Generation of Glosses in the OntoLearn System
Alessandro Cucchiarelli | Roberto Navigli | Francesca Neri | Paola Velardi
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

Automatic Adaptation of WordNet to Domains
Roberto Navigli | Paola Velardi
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

Co-authors

Jose Camacho-Collados 13

Tommaso Pasini 10

Paola Velardi 10

Federico Martelli 9

Riccardo Orlando 9

Stefano Faralli 8

Alessandro Scirè 8

Michele Bevilacqua 7

Niccolò Campolungo 7

Abelardo Carlos Martínez Lorenzo 7

Stefano Perrella 7

Luigi Procopio 7

Andrei Stefan Bejgu 6

Claudio Delli Bovi 6

David Jurgens 6

Lorenzo Proietti 6

Daniele Vannella 6

Francesco Cecconi 5

Ignacio Iacobacci 5

Alessandro Raganato 5

Rocco Tripodi 5

Rexhina Blloshmi 4

Tommaso Bonomo 4

Fabrizio Brignone 4

Alberte Fernández-Castro 4

Giuliano Martinelli 4

Simone Paolo Ponzetto 4

Tiziano Flati 3

Luca Gioffré 3

Francesco Maria Molfese 3

Bianca Scarlini 3

Chengqing Zong 3

Raffaella Bernardi 2

Paul Buitelaar 2

Cesare Campagnano 2

Simone Ciciliano 2

Philipp Cimiano 2

Nigel Collier 2

Alessandro Cucchiarelli 2

Luis Espinosa Anke 2

Leonardo Lavalle 2

John Philip McCrae 2

Francesca Neri 2

Juana María Ruiz-Martínez 2

Federico Scozzafava 2

Giovanni Semeraro 2

Carole Tiberius 2

Andrejs Abele 1

Tosin Adewumi 1

Javier Aula-Blasco 1

Andrea Bacciu 1

Somnath Banerjee 1

Elisa Bassignana 1

Irene Baucells 1

Rohit Bharadwaj 1

Alexandra Birch 1

Matthew Blumberg 1

Georgeta Bordea 1

Richard Brutti 1

Tien-Tung Bui 1

Agostina Calabrese 1

Valentina Caruso 1

Liang-Yu Chen 1

Vu Minh Chien 1

Leonardo Colosi 1

Giuseppe Crisafulli 1

Arnav Varma Dantuluri 1

Flavio De Benedictis 1

Thierry Declerck 1

Felice Dell’Orletta 1

Luigi Di Caro 1

Antonio Di Marco 1

Aleksandr Drozd 1

Ahmed El Sheikh 1

Francesco Elia 1

Edoardo Fabiano 1

Andrea Di Fabio 1

Júlia Falcão 1

Domenico Fedele 1

Andrea Ferrari 1

Giuseppe Fiameni 1

Marc Franco-Salvador 1

Felix Friedrich 1

Tommaso Furlanello 1

Andrea Galassi 1

Apolonija Gantar 1

Iacopo Ghinassi 1

Aitor González-Agirre 1

Kshitij Gupta 1

Christian Hardmeier 1

Orin Hargraves 1

Daniel Hershcovich 1

Eben Holderness 1

Adalberto Barbosa Junior 1

Jelena Kallas 1

Marzena Karpinska 1

Sedrick Scott Keh 1

Svetla Peneva Koeva 1

Alexander Koller 1

Kristina Koppel 1

Wojciech Kusa 1

Caterina Lacerra 1

Veronika Laippala 1

Margit Langemets 1

Mirella Lapata 1

Valentina Leone 1

Veronika Lipp 1

Kenneth C. Litkowski 1

Marco Lo Pinto 1

Bernardo Magnini 1

Valentino Maiorca 1

Massimiliano Mancini 1

Paola Marongiu 1

Diana McCarthy 1

Barbara McGillivray 1

Domenico Meconi 1

Virendra Mehta 1

Alessio Miaschi 1

Mayank Mishra 1

Diganta Misra 1

Francesco Molfese 1

Nour Moustafa-Fahmy 1

Niklas Muennighoff 1

Taishi Nakamura 1

Silvia Necşulescu 1

Axel-Cyrille Ngonga Ngomo 1

Sergio Oramas 1

Sergio Orlandini 1

Malte Ostendorff 1

Gianmarco Pappacoda 1

Gabriella Pasi 1

Rebecca J. Passonneau 1

Saloni Potdar 1

Giovanni Puccetti 1

Antonio Purificato 1

James Pustejovsky 1

Sampo Pyysalo 1

Naiara Pérez 1

Valeria Quochi 1

Kyeongmin Rim 1

Victor Rodriguez-Doncel 1

Dennis Rotondi 1

Horacio Saggion 1

Francesco Saina 1

Andrea Sanchietti 1

Bolette Sanford Pedersen 1

Enrico Santus 1

Daniele Scarfini 1

Steven Schockaert 1

Rico Sennrich 1

Ekaterina Shutova 1

Vered Shwartz 1

Pasquale Silvestri 1

László Simon 1

Giovanni Siragusa 1

Jason T. Stillerman 1

Simone Stirpe 1

Marco Antonio Stranisci 1

Silvia Paniagua Suárez 1

Simone Teglia 1

Gabriele Tola 1

Giovanni Torrisi 1

Paolo Torroni 1

Domenico Toscani 1

Francesco Maria Tucci 1

Rafael-J. Ureña-Ruiz 1

Pavlo Vasylenko 1

Daniel Vila-Suero 1

Marta Villegas 1

Gabriela Vulcu 1

Giulia Vulpis 1

Deyi Xiong (德意熊) 1

Prateek Yadav 1

Davide Zanfardino 1

Roberto Zanoli 1

Terry Yue Zhuo 1

Vilém Zouhar 1

Venues

JEP/TALN/RECITAL1