Giuseppe Castellucci


2021

pdf bib
VoiSeR: A New Benchmark for Voice-Based Search Refinement
Simone Filice | Giuseppe Castellucci | Marcus Collins | Eugene Agichtein | Oleg Rokhlenko
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Voice assistants, e.g., Alexa or Google Assistant, have dramatically improved in recent years. Supporting voice-based search, exploration, and refinement are fundamental tasks for voice assistants, and remain an open challenge. For example, when using voice to search an online shopping site, a user often needs to refine their search by some aspect or facet. This common user intent is usually available through a “filter-by” interface on online shopping websites, but is challenging to support naturally via voice, as the intent of refinements must be interpreted in the context of the original search, the initial results, and the available product catalogue facets. To our knowledge, no benchmark dataset exists for training or validating such contextual search understanding models. To bridge this gap, we introduce the first large-scale dataset of voice-based search refinements, VoiSeR, consisting of about 10,000 search refinement utterances, collected using a novel crowdsourcing task. These utterances are intended to refine a previous search, with respect to a search facet or attribute (e.g., brand, color, review rating, etc.), and are manually annotated with the specific intent. This paper reports qualitative and empirical insights into the most common and challenging types of refinements that a voice-based conversational search system must support. As we show, VoiSeR can support research in conversational query understanding, contextual user intent prediction, and other conversational search topics to facilitate the development of conversational search systems.

pdf bib
Learning to Solve NLP Tasks in an Incremental Number of Languages
Giuseppe Castellucci | Simone Filice | Danilo Croce | Roberto Basili
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

In real scenarios, a multilingual model trained to solve NLP tasks on a set of languages can be required to support new languages over time. Unfortunately, the straightforward retraining on a dataset containing annotated examples for all the languages is both expensive and time-consuming, especially when the number of target languages grows. Moreover, the original annotated material may no longer be available due to storage or business constraints. Re-training only with the new language data will inevitably result in Catastrophic Forgetting of previously acquired knowledge. We propose a Continual Learning strategy that updates a model to support new languages over time, while maintaining consistent results on previously learned languages. We define a Teacher-Student framework where the existing model “teaches” to a student model its knowledge about the languages it supports, while the student is also trained on a new language. We report an experimental evaluation in several tasks including Sentence Classification, Relational Learning and Sequence Labeling.

2020

pdf bib
GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples
Danilo Croce | Giuseppe Castellucci | Roberto Basili
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recent Transformer-based architectures, e.g., BERT, provide impressive results in many Natural Language Processing tasks. However, most of the adopted benchmarks are made of (sometimes hundreds of) thousands of examples. In many real scenarios, obtaining high- quality annotated data is expensive and time consuming; in contrast, unlabeled examples characterizing the target task can be, in general, easily collected. One promising method to enable semi-supervised learning has been proposed in image processing, based on Semi- Supervised Generative Adversarial Networks. In this paper, we propose GAN-BERT that ex- tends the fine-tuning of BERT-like architectures with unlabeled data in a generative adversarial setting. Experimental results show that the requirement for annotated examples can be drastically reduced (up to only 50-100 annotated examples), still obtaining good performances in several sentence classification tasks.

pdf bib
Proceedings of the First International Workshop on Natural Language Processing Beyond Text
Giuseppe Castellucci | Simone Filice | Soujanya Poria | Erik Cambria | Lucia Specia
Proceedings of the First International Workshop on Natural Language Processing Beyond Text

2017

pdf bib
Deep Learning in Semantic Kernel Spaces
Danilo Croce | Simone Filice | Giuseppe Castellucci | Roberto Basili
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Kernel methods enable the direct usage of structured representations of textual data during language learning and inference tasks. Expressive kernels, such as Tree Kernels, achieve excellent performance in NLP. On the other side, deep neural networks have been demonstrated effective in automatically learning feature representations during training. However, their input is tensor data, i.e., they can not manage rich structured information. In this paper, we show that expressive kernels and deep neural networks can be combined in a common framework in order to (i) explicitly model structured information and (ii) learn non-linear decision functions. We show that the input layer of a deep architecture can be pre-trained through the application of the Nystrom low-rank approximation of kernel spaces. The resulting “kernelized” neural network achieves state-of-the-art accuracy in three different tasks.

2016

pdf bib
A Language Independent Method for Generating Large Scale Polarity Lexicons
Giuseppe Castellucci | Danilo Croce | Roberto Basili
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Sentiment Analysis systems aims at detecting opinions and sentiments that are expressed in texts. Many approaches in literature are based on resources that model the prior polarity of words or multi-word expressions, i.e. a polarity lexicon. Such resources are defined by teams of annotators, i.e. a manual annotation is provided to associate emotional or sentiment facets to the lexicon entries. The development of such lexicons is an expensive and language dependent process, making them often not covering all the linguistic sentiment phenomena. Moreover, once a lexicon is defined it can hardly be adopted in a different language or even a different domain. In this paper, we present several Distributional Polarity Lexicons (DPLs), i.e. large-scale polarity lexicons acquired with an unsupervised methodology based on Distributional Models of Lexical Semantics. Given a set of heuristically annotated sentences from Twitter, we transfer the sentiment information from sentences to words. The approach is mostly unsupervised, and experimental evaluations on Sentiment Analysis tasks in two languages show the benefits of the generated resources. The generated DPLs are publicly available in English and Italian.

2015

pdf bib
KeLP: a Kernel-based Learning Platform for Natural Language Processing
Simone Filice | Giuseppe Castellucci | Danilo Croce | Roberto Basili
Proceedings of ACL-IJCNLP 2015 System Demonstrations

2014

pdf bib
UNITOR: Aspect Based Sentiment Analysis with Structured Learning
Giuseppe Castellucci | Simone Filice | Danilo Croce | Roberto Basili
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
HuRIC: a Human Robot Interaction Corpus
Emanuele Bastianelli | Giuseppe Castellucci | Danilo Croce | Luca Iocchi | Roberto Basili | Daniele Nardi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Recent years show the development of large scale resources (e.g. FrameNet for the Frame Semantics) that supported the definition of several state-of-the-art approaches in Natural Language Processing. However, the reuse of existing resources in heterogeneous domains such as Human Robot Interaction is not straightforward. The generalization offered by many data driven methods is strongly biased by the employed data, whose performance in out-of-domain conditions exhibit large drops. In this paper, we present the Human Robot Interaction Corpus (HuRIC). It is made of audio files paired with their transcriptions referring to commands for a robot, e.g. in a home environment. The recorded sentences are annotated with different kinds of linguistic information, ranging from morphological and syntactic information to rich semantic information, according to the Frame Semantics, to characterize robot actions, and Spatial Semantics, to capture the robot environment. All texts are represented through the Abstract Meaning Representation, to adopt a simple but expressive representation of commands, that can be easily translated into the internal representation of the robot.

2013

pdf bib
Textual Inference and Meaning Representation in Human Robot Interaction
Emanuele Bastianelli | Giuseppe Castellucci | Danilo Croce | Roberto Basili
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora

pdf bib
UNITOR: Combining Syntactic and Semantic Kernels for Twitter Sentiment Analysis
Giuseppe Castellucci | Simone Filice | Danilo Croce | Roberto Basili
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)