Roberto Basili - ACL Anthology

Roberto Basili

Also published as: R. Basili

2025

Modeling Background Knowledge with Frame Semantics for Fine-grained Sentiment Classification
Muhammad Okky Ibrohim | Valerio Basile | Danilo Croce | Cristina Bosco | Roberto Basili
Proceedings of the 2nd Workshop on Analogical Abstraction in Cognition, Perception, and Language (Analogy-Angle II)

Few-shot learning via in-context learning (ICL) is widely used in NLP, but its effectiveness is highly sensitive to example selection, often leading to unstable performance. To address this, we introduce BacKGen, a framework for generating structured Background Knowledge (BK) as an alternative to instance-based prompting. Our approach leverages Frame Semantics to uncover recurring conceptual patterns across data instances, clustering examples based on shared event structures and semantic roles. These patterns are then synthesized into generalized knowledge statements using a large language model (LLM) and injected into prompts to support contextual reasoning beyond surface-level cues. We apply BacKGen to Sentiment Phrase Classification (SPC), a task where polarity judgments frequently depend on implicit commonsense knowledge. In this setting, BK serves as an abstract representation of prototypical scenarios, enabling schematic generalization to help the model perform analogical reasoning by mapping new inputs onto generalized event structures. Experimental results with Mistral-7B and Llama3-8B demonstrate that BK-based prompting consistently outperforms standard few-shot approaches, achieving up to 29.94% error reduction.

Evaluating Large Language Models on Wikipedia Graph Navigation: Insights from the WikiGame
Daniele Margiotta | Danilo Croce | Roberto Basili
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

Automatic GRI-SDG Annotation and LLM-Based Filtering for Sustainability Reports
Seyed Alireza Mousavian Anaraki | Danilo Croce | Roberto Basili
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

Grounded Semantic Role Labelling from Synthetic Multimodal Data for Situated Robot Commands
Claudiu Daniel Hromei | Antonio Scaiella | Danilo Croce | Roberto Basili
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Understanding natural language commands in situated Human-Robot Interaction (HRI) requires linking linguistic input to perceptual context. Traditional symbolic parsers lack the flexibility to operate in complex, dynamic environments. We introduce a novel Multimodal Grounded Semantic Role Labelling (G-SRL) framework that combines frame semantics with perceptual grounding, enabling robots to interpret commands via multimodal logical forms. Our approach leverages modern Visual Language Models (VLLMs), which jointly process text and images, and is supported by an automated pipeline that generates high-quality training data. Structured command annotations are converted into photorealistic scenes via LLM-guided prompt engineering and diffusion models, then rigorously validated through object detection and visual question answering. The pipeline produces over 11,000 image-command pairs (3,500+ manually validated), while approaching the quality of manually curated datasets at significantly lower cost.

Sanskrit Voyager: Unified Web Platform for Interactive Reading and Linguistic Analysis of Sanskrit Texts
Giacomo De Luca | Danilo Croce | Roberto Basili
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Sanskrit Voyager is a web application for searching, reading, and analyzing the texts in the Sanskrit literary corpus. Unlike previous tools that require expert linguistic knowledge or manual normalization, Sanskrit Voyager enables users to search for words and phrases as they actually appear in texts, handling inflection, sandhi, and compound forms automatically while supporting any transliteration. The system integrates four core functionalities: (1) multi-dictionary lookup with morphological analysis and inflection tables; (2) real-time text parsing and annotation; (3) an interactive reader for over 900 digitalized texts; and (4) advanced corpus search with fuzzy matching and filtering. Evaluation shows over 92% parsing accuracy on complex compounds and substantially higher search recall than BuddhaNexus on challenging queries. Source code is publicly available under CC-BY-NC license, resource-efficient, and designed for both learners and researchers, offering the first fully integrated, user-friendly platform for computational Sanskrit studies.

Training Multi-Modal LLMs through Dialogue Planning for HRI
Claudiu Daniel Hromei | Federico Borazio | Andrea Sensi | Elisa Passone | Danilo Croce | Roberto Basili
Findings of the Association for Computational Linguistics: ACL 2025

Grounded natural language understanding in Human-Robot Interaction (HRI) requires integrating linguistic, visual, and world knowledge to ensure effective task execution. We propose an approach that enhances Multi-Modal Large Language Models (MLLMs) with a novel explicit dialogue planning phase, allowing robotic agents to systematically refine their understanding of ambiguous commands through structured clarification steps. This reduces hallucinations and improves task feasibility.To evaluate this approach, we introduce a novel dataset of over 1,100 annotated dialogues in English and Italian, designed for fine-tuning and assessing Multi-Modal models in HRI scenarios. Experimental results show that dialogue planning improves response accuracy and quality, and contributes to cross-lingual generalisation, enabling models trained in one language to transfer effectively to another. To the best of our knowledge, this is the first application of structured, goal-driven, and explicit dialogue planning in Multi-Modal LLMs for grounded interaction.

Unsupervised Sustainability Report Labeling based on the integration of the GRI and SDG standards
Seyed Alireza Mousavian Anaraki | Danilo Croce | Roberto Basili
Proceedings of the Fourth Workshop on NLP for Positive Impact (NLP4PI)

Sustainability reports are key instruments for communicating corporate impact, but their unstructured format and varied content pose challenges for large-scale analysis. This paper presents an unsupervised method to annotate paragraphs from sustainability reports against both the Global Reporting Initiative (GRI) and Sustainable Development Goals (SDG) standards. The approach combines structured metadata from GRI content indexes, official GRI–SDG mappings, and text semantic similarity models to produce weakly supervised annotations at scale. To evaluate the quality of these annotations, we train a multi-label classifier on the automatically labeled data and evaluate it on the trusted OSDG Community Dataset. The results show that our method yields meaningful labels and improves classification performance when combined with human-annotated data. Although preliminary, this work offers a foundation for scalable sustainability analysis and opens future directions toward assessing the credibility and depth of corporate sustainability claims.

Injecting Frame Semantics into Large Language Models via Prompt-Based Fine-Tuning
Shahid Iqbal Rai | Danilo Croce | Roberto Basili
Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)

Large Language Models (LLMs) have demonstrated remarkable generalization across diverse NLP tasks, yet they often produce outputs lacking semantic coherence due to insufficient grounding in structured linguistic knowledge. This paper proposes a novel method for injecting Frame Semantics into a pretrained LLaMA model using Low-Rank Adaptation (LoRA). Leveraging FrameNet (a rich resource of over 1,000 semantic frames) we construct a training corpus comprising structured triples of frame definitions, frame elements, and lexical units. Our method encodes these examples into the model via LoRA adapters and evaluates performance using zero-shot prompting for textual entailment and semantic role labeling (SRL) over Framenet. Experimental results show that our adapted frame-aware LLM substantially outperforms the baseline across closed, open-ended, and multiple-choice prompts. Moreover, we observe significant improvements in SRL accuracy, demonstrating the efficacy of combining frame-semantic theory with parameter-efficient pretraining.

2024

La Non Canonica L’hai Studiata? Exploring LLMs and Sentence Canonicity in Italian
Claudiu Daniel Hromei | Danilo Croce | Rodolfo Delmonte | Roberto Basili
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

This paper investigates the ability of Large Language Models (LLMs) to differentiate between canonical and non-canonical sentences in Italian, employing advanced neural architectures like LLaMA and its adaptations. Canonical sentences adhere to the standard Subject-Verb-Object (SVO) structure. We hypothesize that recent generative LLMs are influenced heavily by the English language, where non-canonical structures are very rare. Using the in-context learning technique, we probe these models and further fine-tune them for this specific task. Initial results indicate that these models continue to struggle with this task even after fine-tuning. Additionally, we introduce a new dataset comprising several hundred sentences from the poetry domain, which presents significant challenges for the canonical structure task.

MM-IGLU: Multi-Modal Interactive Grounded Language Understanding
Claudiu Daniel Hromei | Daniele Margiotta | Danilo Croce | Roberto Basili
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper explores Interactive Grounded Language Understanding (IGLU) challenges within Human-Robot Interaction (HRI). In this setting, a robot interprets user commands related to its environment, aiming to discern whether a specific command can be executed. If faced with ambiguities or incomplete data, the robot poses relevant clarification questions. Drawing from the NeurIPS 2022 IGLU competition, we enrich the dataset by introducing our multi-modal data and natural language descriptions in MM-IGLU: Multi-Modal Interactive Grounded Language Understanding. Utilizing a BART-based model that integrates the user’s statement with the environment’s description, and a cutting-edge Multi-Modal Large Language Model that merges both visual and textual data, we offer a valuable resource for ongoing research in the domain. Additionally, we discuss the evaluation methods for such tasks, highlighting potential limitations imposed by traditional string-match-based evaluations on this intricate multi-modal challenge. Moreover, we provide an evaluation benchmark based on human judgment to address the limits and capabilities of such baseline models. This resource is released on a dedicated GitHub repository at https://github.com/crux82/MM-IGLU.

This paper introduces a novel framework to harness Large Language Models (LLMs) for Epidemic Intelligence, focusing on identifying and categorizing emergent socio-political phenomena within health crises, with a spotlight on the COVID-19 pandemic. Our approach diverges from traditional methods, such as Topic Models, by providing explicit support to analysts through the identification of distinct thematic areas and the generation of clear, actionable statements for each topic. This supports a Zero-shot Classification mechanism, enabling effective matching of news articles to fine-grain topics without the need for model fine-tuning. The framework is designed to be as transparent as possible, producing linguistically informed insights to make the analysis more accessible to analysts who may not be familiar with every subject matter of inherently emerging phenomena. This process not only enhances the precision and relevance of the extracted Epidemic Intelligence but also fosters a collaborative environment where system linguistic abilities and the analyst’s domain expertise are integrated.

2023

End-to-end Dependency Parsing via Auto-regressive Large Language Model
Claudiu Daniel Hromei | Danilo Croce | Roberto Basili
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

2022

Learning to Generate Examples for Semantic Processing Tasks
Danilo Croce | Simone Filice | Giuseppe Castellucci | Roberto Basili
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Even if recent Transformer-based architectures, such as BERT, achieved impressive results in semantic processing tasks, their fine-tuning stage still requires large scale training resources. Usually, Data Augmentation (DA) techniques can help to deal with low resource settings. In Text Classification tasks, the objective of DA is the generation of well-formed sentences that i) represent the desired task category and ii) are novel with respect to existing sentences. In this paper, we propose a neural approach to automatically learn to generate new examples using a pre-trained sequence-to-sequence model. We first learn a task-oriented similarity function that we use to pair similar examples. Then, we use these example pairs to train a model to generate examples. Experiments in low resource settings show that augmenting the training material with the proposed strategy systematically improves the results on text classification and natural language inference tasks by up to 10% accuracy, outperforming existing DA approaches.

2021

Learning to Solve NLP Tasks in an Incremental Number of Languages
Giuseppe Castellucci | Simone Filice | Danilo Croce | Roberto Basili
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

In real scenarios, a multilingual model trained to solve NLP tasks on a set of languages can be required to support new languages over time. Unfortunately, the straightforward retraining on a dataset containing annotated examples for all the languages is both expensive and time-consuming, especially when the number of target languages grows. Moreover, the original annotated material may no longer be available due to storage or business constraints. Re-training only with the new language data will inevitably result in Catastrophic Forgetting of previously acquired knowledge. We propose a Continual Learning strategy that updates a model to support new languages over time, while maintaining consistent results on previously learned languages. We define a Teacher-Student framework where the existing model “teaches” to a student model its knowledge about the languages it supports, while the student is also trained on a new language. We report an experimental evaluation in several tasks including Sentence Classification, Relational Learning and Sequence Labeling.

GQA-it: Italian Question Answering on Image Scene Graphs
Danilo Croce | Lucia C. Passaro | Alessandro Lenci | Roberto Basili
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

2020

GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples
Danilo Croce | Giuseppe Castellucci | Roberto Basili
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recent Transformer-based architectures, e.g., BERT, provide impressive results in many Natural Language Processing tasks. However, most of the adopted benchmarks are made of (sometimes hundreds of) thousands of examples. In many real scenarios, obtaining high- quality annotated data is expensive and time consuming; in contrast, unlabeled examples characterizing the target task can be, in general, easily collected. One promising method to enable semi-supervised learning has been proposed in image processing, based on Semi- Supervised Generative Adversarial Networks. In this paper, we propose GAN-BERT that ex- tends the fine-tuning of BERT-like architectures with unlabeled data in a generative adversarial setting. Experimental results show that the requirement for annotated examples can be drastically reduced (up to only 50-100 annotated examples), still obtaining good performances in several sentence classification tasks.

Automatic Induction of FrameNet lexical units in Italian
Silvia Brambilla | Danilo Croce | Fabio Tamburini | Roberto Basili
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

2019

Deep Bidirectional Transformers for Italian Question Answering
Danilo Croce | Giorgio Brandi | Roberto Basili
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

Auditing Deep Learning processes through Kernel-based Explanatory Models
Danilo Croce | Daniele Rossini | Roberto Basili
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

While NLP systems become more pervasive, their accountability gains value as a focal point of effort. Epistemological opaqueness of nonlinear learning methods, such as deep learning models, can be a major drawback for their adoptions. In this paper, we discuss the application of Layerwise Relevance Propagation over a linguistically motivated neural architecture, the Kernel-based Deep Architecture, in order to trace back connections between linguistic properties of input instances and system decisions. Such connections then guide the construction of argumentations on network’s inferences, i.e., explanations based on real examples, semantically related to the input. We propose here a methodology to evaluate the transparency and coherence of analogy-based explanations modeling an audit stage for the system. Quantitative analysis on two semantic tasks, i.e., question classification and semantic role labeling, show that the explanatory capabilities (native in KDAs) are effective and they pave the way to more complex argumentation methods.

2018

On the Readability of Deep Learning Models: the role of Kernel-based Deep Architectures
Danilo Croce | Daniele Rossini | Roberto Basili
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

Explaining non-linear Classifier Decisions within Kernel-based Deep Architectures
Danilo Croce | Daniele Rossini | Roberto Basili
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Nonlinear methods such as deep neural networks achieve state-of-the-art performances in several semantic NLP tasks. However epistemologically transparent decisions are not provided as for the limited interpretability of the underlying acquired neural models. In neural-based semantic inference tasks epistemological transparency corresponds to the ability of tracing back causal connections between the linguistic properties of a input instance and the produced classification output. In this paper, we propose the use of a methodology, called Layerwise Relevance Propagation, over linguistically motivated neural architectures, namely Kernel-based Deep Architectures (KDA), to guide argumentations and explanation inferences. In such a way, each decision provided by a KDA can be linked to real examples, linguistically related to the input instance: these can be used to motivate the network output. Quantitative analysis shows that richer explanations about the semantic and syntagmatic structures of the examples characterize more convincing arguments in two tasks, i.e. question classification and semantic role labeling.

2017

Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)
Roberto Basili | Malvina Nissim | Giorgio Satta
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

Preface
Roberto Basili | Malvina Nissim | Giorgio Satta
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

Monitoring Adolescents’ Distress using Social Web data as a Source: the InsideOut Project
Roberto Basili | Valentina Bellomaria | Niels Jonas Bugge | Danilo Croce | Francesco De Michele | Federico Fiori Nastro | Paolo Fiori Nastro | Chantal Michel | Stefanie Schmidt | Frauke Schultze-Lutter
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

Developing a Large Scale FrameNet for Italian: the IFrameNet Experience
Silvia Brambilla | Danilo Croce | Fabio Tamburini | Roberto Basili
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

Deep Learning for Automatic Image Captioning in Poor Training Conditions
Caterina Masotti | Danilo Croce | Roberto Basili
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

Deep Learning in Semantic Kernel Spaces
Danilo Croce | Simone Filice | Giuseppe Castellucci | Roberto Basili
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Kernel methods enable the direct usage of structured representations of textual data during language learning and inference tasks. Expressive kernels, such as Tree Kernels, achieve excellent performance in NLP. On the other side, deep neural networks have been demonstrated effective in automatically learning feature representations during training. However, their input is tensor data, i.e., they can not manage rich structured information. In this paper, we show that expressive kernels and deep neural networks can be combined in a common framework in order to (i) explicitly model structured information and (ii) learn non-linear decision functions. We show that the input layer of a deep architecture can be pre-trained through the application of the Nystrom low-rank approximation of kernel spaces. The resulting “kernelized” neural network achieves state-of-the-art accuracy in three different tasks.

Structured Learning for Context-aware Spoken Language Understanding of Robotic Commands
Andrea Vanzo | Danilo Croce | Roberto Basili | Daniele Nardi
Proceedings of the First Workshop on Language Grounding for Robotics

Service robots are expected to operate in specific environments, where the presence of humans plays a key role. A major feature of such robotics platforms is thus the ability to react to spoken commands. This requires the understanding of the user utterance with an accuracy able to trigger the robot reaction. Such correct interpretation of linguistic exchanges depends on physical, cognitive and language-dependent aspects related to the environment. In this work, we present the empirical evaluation of an adaptive Spoken Language Understanding chain for robotic commands, that explicitly depends on the operational environment during both the learning and recognition stages. The effectiveness of such a context-sensitive command interpretation is tested against an extension of an already existing corpus of commands, that introduced explicit perceptual knowledge: this enabled deeper measures proving that more accurate disambiguation capabilities can be actually obtained.

2016

A Language Independent Method for Generating Large Scale Polarity Lexicons
Giuseppe Castellucci | Danilo Croce | Roberto Basili
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Sentiment Analysis systems aims at detecting opinions and sentiments that are expressed in texts. Many approaches in literature are based on resources that model the prior polarity of words or multi-word expressions, i.e. a polarity lexicon. Such resources are defined by teams of annotators, i.e. a manual annotation is provided to associate emotional or sentiment facets to the lexicon entries. The development of such lexicons is an expensive and language dependent process, making them often not covering all the linguistic sentiment phenomena. Moreover, once a lexicon is defined it can hardly be adopted in a different language or even a different domain. In this paper, we present several Distributional Polarity Lexicons (DPLs), i.e. large-scale polarity lexicons acquired with an unsupervised methodology based on Distributional Models of Lexical Semantics. Given a set of heuristically annotated sentences from Twitter, we transfer the sentiment information from sentences to words. The approach is mostly unsupervised, and experimental evaluations on Sentiment Analysis tasks in two languages show the benefits of the generated resources. The generated DPLs are publicly available in English and Italian.

KeLP at SemEval-2016 Task 3: Learning Semantic Relations between Questions and Answers
Simone Filice | Danilo Croce | Alessandro Moschitti | Roberto Basili
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

KeLP: a Kernel-based Learning Platform for Natural Language Processing
Simone Filice | Giuseppe Castellucci | Danilo Croce | Roberto Basili
Proceedings of ACL-IJCNLP 2015 System Demonstrations

2014

A context-based model for Sentiment Analysis in Twitter
Andrea Vanzo | Danilo Croce | Roberto Basili
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

HuRIC: a Human Robot Interaction Corpus
Emanuele Bastianelli | Giuseppe Castellucci | Danilo Croce | Luca Iocchi | Roberto Basili | Daniele Nardi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Recent years show the development of large scale resources (e.g. FrameNet for the Frame Semantics) that supported the definition of several state-of-the-art approaches in Natural Language Processing. However, the reuse of existing resources in heterogeneous domains such as Human Robot Interaction is not straightforward. The generalization offered by many data driven methods is strongly biased by the employed data, whose performance in out-of-domain conditions exhibit large drops. In this paper, we present the Human Robot Interaction Corpus (HuRIC). It is made of audio files paired with their transcriptions referring to commands for a robot, e.g. in a home environment. The recorded sentences are annotated with different kinds of linguistic information, ranging from morphological and syntactic information to rich semantic information, according to the Frame Semantics, to characterize robot actions, and Spatial Semantics, to capture the robot environment. All texts are represented through the Abstract Meaning Representation, to adopt a simple but expressive representation of commands, that can be easily translated into the internal representation of the robot.

UNITOR: Aspect Based Sentiment Analysis with Structured Learning
Giuseppe Castellucci | Simone Filice | Danilo Croce | Roberto Basili
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

UNITOR-CORE_TYPED: Combining Text Similarity and Semantic Filters through SV Regression
Danilo Croce | Valerio Storch | Roberto Basili
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

UNITOR: Combining Syntactic and Semantic Kernels for Twitter Sentiment Analysis
Giuseppe Castellucci | Simone Filice | Danilo Croce | Roberto Basili
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

UNITOR-HMM-TK: Structured Kernel-based learning for Spatial Role Labeling
Emanuele Bastianelli | Danilo Croce | Roberto Basili | Daniele Nardi
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

Towards Compositional Tree Kernels
Paolo Annesi | Danilo Croce | Roberto Basili
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora

Textual Inference and Meaning Representation in Human Robot Interaction
Emanuele Bastianelli | Giuseppe Castellucci | Danilo Croce | Roberto Basili
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora

2012

Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
Danilo Croce | Alessandro Moschitti | Roberto Basili | Martha Palmer
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

UNITOR: Combining Semantic Text Similarity functions through SV Regression
Danilo Croce | Paolo Annesi | Valerio Storch | Roberto Basili
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

Structured Lexical Similarity via Convolution Kernels on Dependency Trees
Danilo Croce | Alessandro Moschitti | Roberto Basili
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

Extensive Evaluation of a FrameNet-WordNet mapping resource
Diego De Cao | Danilo Croce | Roberto Basili
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Lexical resources are basic components of many text processing system devoted to information extraction, question answering or dialogue. In paste years many resources have been developed such as FrameNet and WordNet. FrameNet describes prototypical situations (i.e. Frames) while WordNet defines lexical meaning (senses) for the majority of English nouns, verbs, adjectives and adverbs. A major difference between FrameNet and WordNet refers to their coverage. Due of this lack of coverage, in recent years some approaches have been studied to make a bridge between this two resources, so a resource is used to extend the coverage of the other one. The nature of these approaches leave from supervised to supervised methods. The major problem is that there is not a standard in evaluation of the mapping. Each different work have tested own approach with a custom gold standard. This work give an extensive evaluation of the model proposed in (De Cao et al., 2008) using gold standard proposed in other works. Moreover this work give an empirical comparison between other available resources. As outcome of this work we also release the full mapping resource made according to the model proposed in (De Cao et al., 2008).

Towards Open-Domain Semantic Role Labeling
Danilo Croce | Cristina Giannone | Paolo Annesi | Roberto Basili
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Robust and Efficient Page Rank for Word Sense Disambiguation
Diego De Cao | Roberto Basili | Matteo Luciani | Francesco Mesiano | Riccardo Rossi
Proceedings of TextGraphs-5 - 2010 Workshop on Graph-based Methods for Natural Language Processing

Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics
Roberto Basili | Marco Pennacchiotti
Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics

2009

Proceedings of the Workshop on Geometrical Models of Natural Language Semantics
Roberto Basili | Marco Pennacchiotti
Proceedings of the Workshop on Geometrical Models of Natural Language Semantics

2008

Automatic induction of FrameNet lexical units
Marco Pennacchiotti | Diego De Cao | Roberto Basili | Danilo Croce | Michael Roth
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

Tree Kernels for Semantic Role Labeling
Alessandro Moschitti | Daniele Pighin | Roberto Basili
Computational Linguistics, Volume 34, Number 2, June 2008 - Special Issue on Semantic Role Labeling

Towards a Vector Space Model for FrameNet-like Resources
Marco Pennacchiotti | Diego De Cao | Paolo Marocco | Roberto Basili
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we present an original framework to model frame semantic resources (namely, FrameNet) using minimal supervision. This framework can be leveraged both to expand an existing FrameNet with new knowledge, and to induce a FrameNet in a new language. Our hypothesis is that a frame semantic resource can be modeled and represented by a suitable semantic space model. The intuition is that semantic spaces are an effective model of the notion of being characteristic of a frame for both lexical elements and full sentences. The paper gives two main contributions. First, it shows that our hypothesis is valid and can be successfully implemented. Second, it explores different types of semantic VSMs, outlining which one is more suitable for representing a frame semantic resource. In the paper, VSMs are used for modeling the linguistic core of a frame, the lexical units. Indeed, if the hypothesis is verified for these units, the proposed framework has a much wider application.

Combining Word Sense and Usage for Modeling Frame Semantics
Diego De Cao | Danilo Croce | Marco Pennacchiotti | Roberto Basili
Semantics in Text Processing. STEP 2008 Conference Proceedings

2007

Exploiting Syntactic and Shallow Semantic Kernels for Question Answer Classification
Alessandro Moschitti | Silvia Quarteroni | Roberto Basili | Suresh Manandhar
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

RTV: Tree Kernels for Thematic Role Classification
Daniele Pighin | Alessandro Moschitti | Roberto Basili
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

A Tree Kernel approach to Question and Answer Classification in Question Answering Systems
Alessandro Moschitti | Roberto Basili
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

A critical step in Question Answering design is the definition of the models for question focus identification and answer extraction. In case of factoid questions, we can use a question classifier (trained according to a target taxonomy) and a named entity recognizer. Unfortunately, this latter cannot be applied to generate answers related to non-factoid questions. In this paper, we tackle such problem by designing classifiers of non-factoid answers. As the feature design for this learning task is very complex, we take advantage of tree kernels to generate large feature set from the syntactic parse trees of passages relevant to the target question. Such kernels encode syntactic and lexical information in Support Vector Machines which can decide if a sentence focuses on a target taxonomy subject. The experiments with SVMs on the TREC 10 dataset show that our approach is an interesting future research.

Tree Kernel Engineering in Semantic Role Labeling Systems
Alessandro Moschitti | Daniele Pighin | Roberto Basili
Proceedings of the Workshop on Learning Structured Information in Natural Language Applications

Semantic Role Labeling via Tree Kernel Joint Inference
Alessandro Moschitti | Daniele Pighin | Roberto Basili
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)

2005

Engineering of Syntactic Features for Shallow Semantic Parsing
Alessandro Moschitti | Bonaventura Coppola | Daniele Pighin | Roberto Basili
Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing

Effective use of WordNet Semantics via Kernel-Based Learning
Roberto Basili | Marco Cammisa | Alessandro Moschitti
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

Hierarchical Semantic Role Labeling
Alessandro Moschitti | Ana-Maria Giuglea | Bonaventura Coppola | Roberto Basili
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

Verb Subcategorization Kernels for Automatic Semantic Labeling
Alessandro Moschitti | Roberto Basili
Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition

2004

Large Scale Experiments for Semantic Labeling of Noun Phrases in Raw Text
Louise Guthrie | Roberto Basili | Fabio Zanzotto | Kalina Bontcheva | Hamish Cunningham | David Guthrie | Jia Cui | Marco Cammisa | Jerry Cheng-Chieh Liu | Cassia Farria Martin | Kristiyan Haralambiev | Martin Holub | Klaus Macherey | Fredrick Jelinek
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

A2Q: An Agent-based Architecure for Multilingual Q&A
Roberto Basili | Nicola Lorusso | Maria Teresa Pazienza | Fabio Massimo Zanzotto
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

A Similarity Measure for Unsupervised Semantic Disambiguation
Roberto Basili | Marco Cammisa | Fabio Massimo Zanzotto
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Ontological resources and question answering
Roberto Basili | Dorte H. Hansen | Patrizia Paggio | Maria Teresa Pazienza | Fabio Massimo Zanzotto
Proceedings of the Workshop on Pragmatics of Question Answering at HLT-NAACL 2004

2003

Book Reviews: Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms by Thorsten Joachims; Anaphora Resolution by Ruslan Mitkov
Roberto Basili | Michael Strube
Computational Linguistics, Volume 29, Number 4, December 2003

2002

Decision Trees as Explicit Domain Term Definitions
Roberto Basili | Maria Teresa Pazienza | Fabio Massimo Zanzotto
COLING 2002: The 19th International Conference on Computational Linguistics

Knowledge-Based Multilingual Document Analysis
R. Basili | R. Catizone | L. Padro | M.T. Pazienza | G. Rigau | A. Setzer | N. Webb | F. Zanzotto
COLING-02: SEMANET: Building and Using Semantic Networks

2001

Identification of Relevant Terms to Support the Construction of Domain Ontologies
Paola Velardi | Michele Missikoff | Roberto Basili
Proceedings of the ACL 2001 Workshop on Human Language Technology and Knowledge Management

Multilingual Authoring: the NAMIC Approach
Roberto Basili | Maria Teresa Pazienza | Fabio Massimo Zanzotto | Roberta Catizone | Andrea Setzer | Nick Webb | Yorick Wilks | Lluís Padró | German Rigau
Proceedings of the ACL 2001 Workshop on Human Language Technology and Knowledge Management

2000

Customizable Modular Lexicalized Parsing
R. Basili | M. T. Pazienza | F. M. Zanzotto
Proceedings of the Sixth International Workshop on Parsing Technologies

Different NLP applications have different efficiency constraints (i.e. quality of the results and throughput) that reflect on each core linguistic component. Syntactic processors are basic modules in some NLP application. A customization that permits the performance control of these components enables their reuse in different application scenarios. Throughput has been commonly improved using partial syntactic processors. On the other hand, specialized lexicons are generally employed to improve the quality of the syntactic material produced by specific parsing (sub)process (e.g. verb argument detection or PP attachment disambiguation) . Building upon the idea of grammar stratification, in this paper a method to push modularity and lexical sensitivity, in parsing, in view of customizable syntactic analysers is presented. A framework for modular parser design is proposed and its main properties are discussed. Parsers (i.e. different parsing module chains) are then presented and their performances are analyzed in an application-driven scenarios.

Tuning Lexicons to New Operational Scenarios
Roberto Basili | Maria Teresa Pazienza | Michele Vindigni | Fabio Massimo Zanzotto
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1998

Automatic Adaptation of WordNet to Sublanguages and to Computational Tasks
Roberto Basili | Alessandro Cucchiarelli | Carlo Consoli | Maria Teresa Pazienza | Paola Velardi
Usage of WordNet in Natural Language Processing Systems

1997

Towards a Bootstrapping Framework for Corpus Semantic Tagging
Roberto Basili | Michelangelo Della Rocca | Maria Teresa Pazienza
Tagging Text with Lexical Semantics: Why, What, and How?

Inducing Terminology for Lexical Acquisition
Roberto Basili | Gianluca De Rossi | Maria Teresa Pazienza
Second Conference on Empirical Methods in Natural Language Processing

1996

Integrating General-purpose and Corpus-based Verb Classification
Roberto Basili | Maria Teresa Pazienza | Paola Velardi
Computational Linguistics, Volume 22, Number 4, December 1996

Unsupervised Learning of Syntactic Knowledge: Methods and Measures
R. Basili | A. Marziali | M.T. Pazienza | P. Velardi
Conference on Empirical Methods in Natural Language Processing

1994

Might a semantic lexicon support hypertextual authoring?
Roberto Basili | Fabrizio Grisoli | Maria Teresa Pazienza
Fourth Conference on Applied Natural Language Processing

A “not-so-shallow” parser for collocational analysis
R. Basili | M.T. Pazienza | P. Velardi
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

The Noisy Channel and the Braying Donkey
Roberto Basili | Maria Teresa Pazienza | Paola Velardi
The Balancing Act: Combining Symbolic and Statistical Approaches to Language

1993

Hierarchical Clustering of Verbs
Roberto Basili | Maria Pazienza | Paola Velardi
Acquisition of Lexical Knowledge from Text

1992

Computational Lexicons: the Neat Examples and the Odd Exemplars
Roberto Basili | Maria Teresa Pazienza | Paola Velardi
Third Conference on Applied Natural Language Processing

Co-authors

Paola Velardi 8

Simone Filice 7

Claudiu Daniel Hromei 5

Marco Pennacchiotti 5

Daniele Pighin 5

Emanuele Bastianelli 3

Marco Cammisa 3

Daniele Margiotta 3

Daniele Nardi 3

Daniele Rossini 3

Federico Borazio 2

Silvia Brambilla 2

Roberta Catizone 2

Bonaventura Coppola 2

Rodolfo Delmonte 2

Seyed Alireza Mousavian Anaraki 2

Malvina Nissim 2

Lluís Padró 2

Giorgio Satta 2

Antonio Scaiella 2

Andrea Setzer 2

Valerio Storch 2

Fabio Tamburini 2

Valerio Basile 1

Marco Battista 1

Valentina Bellomaria 1

Kalina Bontcheva 1

Cristina Bosco 1

Giorgio Brandi 1

Niels Jonas Bugge 1

Nicoletta Calzolari 1

Andrea Cannone 1

Carlo Consoli 1

Ornella Corazzari 1

Alessandro Cucchiarelli 1

Hamish Cunningham 1

Giacomo De Luca 1

Francesco De Michele 1

Gianluca De Rossi 1

Martina Del Manso 1

Michelangelo Della Rocca 1

Federica Ferraro 1

Federico Fiori Nastro 1

Paolo Fiori Nastro 1

Giorgio Gambosi 1

Cristina Giannone 1

Ana-Maria Giuglea 1

Fabrizio Grisoli 1

Louise Guthrie 1

David Guthrie 1

Dorte Haltrup Hansen 1

Kristiyan Haralambiev 1

Muhammad Okky Ibrohim 1

Frederick Jelinek 1

Alessandro Lenci 1

Jerry Cheng-Chieh Liu 1

Nicola Lorusso 1

Matteo Luciani 1

Klaus Macherey 1

Suresh Manandhar 1

Paolo Marocco 1

Cassia Farria Martin 1

Caterina Masotti 1

Francesco Mesiano 1

Chantal Michel 1

Daniele Mipatrini 1

Michele Missikoff 1

Simonetta Montemagni 1

Patrizia Paggio 1

Martha Palmer 1

Lucia C. Passaro 1

Elisa Passone 1

Daniele Petrone 1

Patrizio Pezzotti 1

Fabio Pianesi 1

Silvia Quarteroni 1

Remo Raffaelli 1

Shahid Iqbal Rai 1

Flavia Riccardo 1

Riccardo Rossi 1

Stefanie Schmidt 1

Frauke Schultze-Lutter 1

Michael Strube 1

Alberto M. Urdiales 1

Michele Vindigni 1

Antonio Zampolli 1

F. M. Zanzotto 1

Venues