2024
pdf
bib
abs
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion
Giuseppe Ruggiero
|
Matteo Testa
|
Jurgen Van De Walle
|
Luigi Di Caro
Findings of the Association for Computational Linguistics: EMNLP 2024
The creation of artificial polyglot voices remains a challenging task, despite considerable progress in recent years. This paper investigates self-supervised learning for voice conversion to create native-sounding polyglot voices. We introduce a novel cross-lingual any-to-one voice conversion system that is able to preserve the source accent without the need for multilingual data from the target speaker. In addition, we show a novel cross-lingual fine-tuning strategy that further improves the accent and reduces the training data requirements. Objective and subjective evaluations with English, Spanish, French and Mandarin Chinese confirm that our approach improves on state-of-the-art methods, enhancing the speech intelligibility and overall quality of the converted speech, especially in cross-lingual scenarios. Audio samples are available at: https://giuseppe-ruggiero.github.io/a2o-vc-demo/
pdf
bib
abs
EcoVerse: An Annotated Twitter Dataset for Eco-Relevance Classification, Environmental Impact Analysis, and Stance Detection
Francesca Grasso
|
Stefano Locci
|
Giovanni Siragusa
|
Luigi Di Caro
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Anthropogenic ecological crisis constitutes a significant challenge that all within the academy must urgently face, including the Natural Language Processing (NLP) community. While recent years have seen increasing work revolving around climate-centric discourse, crucial environmental and ecological topics outside of climate change remain largely unaddressed, despite their prominent importance. Mainstream NLP tasks, such as sentiment analysis, dominate the scene, but there remains an untouched space in the literature involving the analysis of environmental impacts of certain events and practices. To address this gap, this paper presents EcoVerse, an annotated English Twitter dataset of 3,023 tweets spanning a wide spectrum of environmental topics. We propose a three-level annotation scheme designed for Eco-Relevance Classification, Stance Detection, and introducing an original approach for Environmental Impact Analysis. We detail the data collection, filtering, and labeling process that led to the creation of the dataset. Remarkable Inter-Annotator Agreement indicates that the annotation scheme produces consistent annotations of high quality. Subsequent classification experiments using BERT-based models, including ClimateBERT, are presented. These yield encouraging results, while also indicating room for a model specifically tailored for environmental texts. The dataset is made freely available to stimulate further research.
2020
pdf
bib
abs
Populating Legal Ontologies using Semantic Role Labeling
Llio Humphreys
|
Guido Boella
|
Luigi Di Caro
|
Livio Robaldo
|
Leon van der Torre
|
Sepideh Ghanavati
|
Robert Muthuri
Proceedings of the Twelfth Language Resources and Evaluation Conference
This paper is concerned with the goal of maintaining legal information and compliance systems: the ‘resource consumption bottleneck’ of creating semantic technologies manually. The use of automated information extraction techniques could significantly reduce this bottleneck. The research question of this paper is: How to address the resource bottleneck problem of creating specialist knowledge management systems? In particular, how to semi-automate the extraction of norms and their elements to populate legal ontologies? This paper shows that the acquisition paradox can be addressed by combining state-of-the-art general-purpose NLP modules with pre- and post-processing using rules based on domain knowledge. It describes a Semantic Role Labeling based information extraction system to extract norms from legislation and represent them as structured norms in legal ontologies. The output is intended to help make laws more accessible, understandable, and searchable in legal document management systems such as Eunomos (Boella et al., 2016).
pdf
bib
abs
Building Semantic Grams of Human Knowledge
Valentina Leone
|
Giovanni Siragusa
|
Luigi Di Caro
|
Roberto Navigli
Proceedings of the Twelfth Language Resources and Evaluation Conference
Word senses are typically defined with textual definitions for human consumption and, in computational lexicons, put in context via lexical-semantic relations such as synonymy, antonymy, hypernymy, etc. In this paper we embrace a radically different paradigm that provides a slot-filler structure, called “semagram”, to define the meaning of words in terms of their prototypical semantic information. We propose a semagram-based knowledge model composed of 26 semantic relationships which integrates features from a range of different sources, such as computational lexicons and property norms. We describe an annotation exercise regarding 50 concepts over 10 different categories and put forward different automated approaches for extending the semagram base to thousands of concepts. We finally evaluated the impact of the proposed resource on a semantic similarity task, showing significant improvements over state-of-the-art word embeddings.
2019
pdf
bib
abs
Real Life Application of a Question Answering System Using BERT Language Model
Francesca Alloatti
|
Luigi Di Caro
|
Gianpiero Sportelli
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue
It is often hard to apply the newest advances in research to real life scenarios. They usually require the resolution of some specific task applied to a restricted domain, all the while providing small amounts of data to begin with. In this study we apply one of the newest innovations in Deep Learning to a task of text classification. We created a question answering system in Italian that provides information about a specific subject, e-invoicing and digital billing. Italy recently introduced a new legislation about e-invoicing and people have some legit doubts, therefore a large share of professionals could benefit from this tool.
2016
pdf
bib
NORMAS at SemEval-2016 Task 1: SEMSIM: A Multi-Feature Approach to Semantic Text Similarity
Kolawole Adebayo
|
Luigi Di Caro
|
Guido Boella
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
pdf
bib
abs
Automatic Enrichment of WordNet with Common-Sense Knowledge
Luigi Di Caro
|
Guido Boella
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
WordNet represents a cornerstone in the Computational Linguistics field, linking words to meanings (or senses) through a taxonomical representation of synsets, i.e., clusters of words with an equivalent meaning in a specific context often described by few definitions (or glosses) and examples. Most of the approaches to the Word Sense Disambiguation task fully rely on these short texts as a source of contextual information to match with the input text to disambiguate. This paper presents the first attempt to enrich synsets data with common-sense definitions, automatically retrieved from ConceptNet 5, and disambiguated accordingly to WordNet. The aim was to exploit the shared- and immediate-thinking nature of common-sense knowledge to extend the short but incredibly useful contextual information of the synsets. A manual evaluation on a subset of the entire result (which counts a total of almost 600K synset enrichments) shows a very high precision with an estimated good recall.
2014
pdf
bib
abs
Exploiting networks in Law
Livio Robaldo
|
Guido Boella
|
Luigi Di Caro
|
Andrea Violato
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In this paper we first introduce the working context related to the understanding of an heterogeneous network of references contained in the Italian regulatory framework. We then present an extended analysis of a large network of laws, providing several types of analytical evaluation that can be used within a legal management system for understanding the data through summarization, visualization, and browsing. In the legal domain, yet several tasks are strictly supervised by humans, with strong consumption of time and energy that would dramatically drop with the help of automatic or semi-automatic supporting tools. We overview different techniques and methodologies explaining how they can be helpful in actual scenarios.
2013
pdf
bib
Extracting Definitions and Hypernym Relations relying on Syntactic Dependencies and Support Vector Machines
Guido Boella
|
Luigi Di Caro
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
2012
pdf
bib
abs
NLP Challenges for Eunomos a Tool to Build and Manage Legal Knowledge
Guido Boella
|
Luigi di Caro
|
Llio Humphreys
|
Livio Robaldo
|
Leon van der Torre
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
In this paper, we describe how NLP can semi-automate the construction and analysis of knowledge in Eunomos, a legal knowledge management service which enables users to view legislation from various sources and find the right definitions and explanations of legal concepts in a given context. NLP can semi-automate some routine tasks currently performed by knowledge engineers, such as classifying norm, or linking key terms within legislation to ontological concepts. This helps overcome the resource bottleneck problem of creating specialist knowledge management systems. While accuracy is of the utmost importance in the legal domain, and the information should be verified by domain experts as a matter of course, a semi-automated approach can result in considerable efficiency gains.