Alessandro Lenci


2021

pdf bib
PIHKers at CMCL 2021 Shared Task: Cosine Similarity and Surprisal to Predict Human Reading Patterns.
Lavinia Salicchi | Alessandro Lenci
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Eye-tracking psycholinguistic studies have revealed that context-word semantic coherence and predictability influence language processing. In this paper we show our approach to predict eye-tracking features from the ZuCo dataset for the shared task of the Cognitive Modeling and Computational Linguistics (CMCL2021) workshop. Using both cosine similarity and surprisal within a regression model, we significantly improved the baseline Mean Absolute Error computed among five eye-tracking features.

pdf bib
Did the Cat Drink the Coffee? Challenging Transformers with Generalized Event Knowledge
Paolo Pedinotti | Giulia Rambelli | Emmanuele Chersoni | Enrico Santus | Alessandro Lenci | Philippe Blache
Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics

Prior research has explored the ability of computational models to predict a word semantic fit with a given predicate. While much work has been devoted to modeling the typicality relation between verbs and arguments in isolation, in this paper we take a broader perspective by assessing whether and to what extent computational approaches have access to the information about the typicality of entire events and situations described in language (Generalized Event Knowledge). Given the recent success of Transformers Language Models (TLMs), we decided to test them on a benchmark for the dynamic estimation of thematic fit. The evaluation of these models was performed in comparison with SDM, a framework specifically designed to integrate events in sentence meaning representations, and we conducted a detailed error analysis to investigate which factors affect their behavior. Our results show that TLMs can reach performances that are comparable to those achieved by SDM. However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge, and their predictions often depend on surface linguistic features, such as frequent words, collocations and syntactic patterns, thereby showing sub-optimal generalization abilities.

2020

pdf bib
Don’t Invite BERT to Drink a Bottle: Modeling the Interpretation of Metonymies Using BERT and Distributional Representations
Paolo Pedinotti | Alessandro Lenci
Proceedings of the 28th International Conference on Computational Linguistics

In this work, we carry out two experiments in order to assess the ability of BERT to capture the meaning shift associated with metonymic expressions. We test the model on a new dataset that is representative of the most common types of metonymy. We compare BERT with the Structured Distributional Model (SDM), a model for the representation of words in context which is based on the notion of Generalized Event Knowledge. The results reveal that, while BERT ability to deal with metonymy is quite limited, SDM is good at predicting the meaning of metonymic expressions, providing support for an account of metonymy based on event knowledge.

pdf bib
“Voices of the Great War”: A Richly Annotated Corpus of Italian Texts on the First World War
Federico Boschetti | Irene De Felice | Stefano Dei Rossi | Felice Dell’Orletta | Michele Di Giorgio | Martina Miliani | Lucia C. Passaro | Angelica Puddu | Giulia Venturi | Nicola Labanca | Alessandro Lenci | Simonetta Montemagni
Proceedings of the 12th Language Resources and Evaluation Conference

“Voices of the Great War” is the first large corpus of Italian historical texts dating back to the period of First World War. This corpus differs from other existing resources in several respects. First, from the linguistic point of view it gives account of the wide range of varieties in which Italian was articulated in that period, namely from a diastratic (educated vs. uneducated writers), diaphasic (low/informal vs. high/formal registers) and diatopic (regional varieties, dialects) points of view. From the historical perspective, through a collection of texts belonging to different genres it represents different views on the war and the various styles of narrating war events and experiences. The final corpus is balanced along various dimensions, corresponding to the textual genre, the language variety used, the author type and the typology of conveyed contents. The corpus is fully annotated with lemmas, part-of-speech, terminology, and named entities. Significant corpus samples representative of the different “voices” have also been enriched with meta-linguistic and syntactic information. The layer of syntactic annotation forms the first nucleus of an Italian historical treebank complying with the Universal Dependencies standard. The paper illustrates the final resource, the methodology and tools used to build it, and the Web Interface for navigating it.

pdf bib
Are Word Embeddings Really a Bad Fit for the Estimation of Thematic Fit?
Emmanuele Chersoni | Ludovica Pannitto | Enrico Santus | Alessandro Lenci | Chu-Ren Huang
Proceedings of the 12th Language Resources and Evaluation Conference

While neural embeddings represent a popular choice for word representation in a wide variety of NLP tasks, their usage for thematic fit modeling has been limited, as they have been reported to lag behind syntax-based count models. In this paper, we propose a complete evaluation of count models and word embeddings on thematic fit estimation, by taking into account a larger number of parameters and verb roles and introducing also dependency-based embeddings in the comparison. Our results show a complex scenario, where a determinant factor for the performance seems to be the availability to the model of reliable syntactic information for building the distributional representations of the roles.

pdf bib
Representing Verbs with Visual Argument Vectors
Irene Sucameli | Alessandro Lenci
Proceedings of the 12th Language Resources and Evaluation Conference

Is it possible to use images to model verb semantic similarities? Starting from this core question, we developed two textual distributional semantic models and a visual one. We found particularly interesting and challenging to investigate this Part of Speech since verbs are not often analysed in researches focused on multimodal distributional semantics. After the creation of the visual and textual distributional space, the three models were evaluated in relation to SimLex-999, a gold standard resource. Through this evaluation, we demonstrate that, using visual distributional models, it is possible to extract meaningful information and to effectively capture the semantic similarity between verbs.

pdf bib
FRAQUE: a FRAme-based QUEstion-answering system for the Public Administration domain
Martina Miliani | Lucia C. Passaro | Alessandro Lenci
Proceedings of the 1st Workshop on Language Technologies for Government and Public Administration (LT4Gov)

In this paper, we propose FRAQUE, a question answering system for factoid questions in the Public administration domain. The system is based on semantic frames, here intended as collections of slots typed with their possible values. FRAQUE queries unstructured textual data and exploits the potential of different approaches: it extracts pattern elements from texts which are linguistically analyzed through statistical methods.FRAQUE allows Italian users to query vast document repositories related to the domain of Public Administration. Given the statistical nature of most of its components such as word embeddings, the system allows for a flexible domain and language adaptation process. FRAQUE’s goal is to associate questions with frames stored into a Knowledge Graph along with relevant document passages, which are returned as the answer.

pdf bib
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon
Michael Zock | Emmanuele Chersoni | Alessandro Lenci | Enrico Santus
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon

pdf bib
Comparing Probabilistic, Distributional and Transformer-Based Models on Logical Metonymy Interpretation
Giulia Rambelli | Emmanuele Chersoni | Alessandro Lenci | Philippe Blache | Chu-Ren Huang
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

In linguistics and cognitive science, Logical metonymies are defined as type clashes between an event-selecting verb and an entity-denoting noun (e.g. The editor finished the article), which are typically interpreted by inferring a hidden event (e.g. reading) on the basis of contextual cues. This paper tackles the problem of logical metonymy interpretation, that is, the retrieval of the covert event via computational methods. We compare different types of models, including the probabilistic and the distributional ones previously introduced in the literature on the topic. For the first time, we also tested on this task some of the recent Transformer-based models, such as BERT, RoBERTa, XLNet, and GPT-2. Our results show a complex scenario, in which the best Transformer-based models and some traditional distributional models perform very similarly. However, the low performance on some of the testing datasets suggests that logical metonymy is still a challenging phenomenon for computational modeling.

pdf bib
PISA: A measure of Preference In Selection of Arguments to model verb argument recoverability
Giulia Cappelli | Alessandro Lenci
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics

Our paper offers a computational model of the semantic recoverability of verb arguments, tested in particular on direct objects and Instruments. Our fully distributional model is intended to improve on older taxonomy-based models, which require a lexicon in addition to the training corpus. We computed the selectional preferences of 99 transitive verbs and 173 Instrument verbs as the mean value of the pairwise cosines between their arguments (a weighted mean between all the arguments, or an unweighted mean with the topmost k arguments). Results show that our model can predict the recoverability of objects and Instruments, providing a similar result to that of taxonomy-based models but at a much cheaper computational cost.

2019

pdf bib
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Emmanuele Chersoni | Cassandra Jacobs | Alessandro Lenci | Tal Linzen | Laurent Prévot | Enrico Santus
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

pdf bib
Distributional Semantics Meets Construction Grammar. towards a Unified Usage-Based Model of Grammar and Meaning
Giulia Rambelli | Emmanuele Chersoni | Philippe Blache | Chu-Ren Huang | Alessandro Lenci
Proceedings of the First International Workshop on Designing Meaning Representations

In this paper, we propose a new type of semantic representation of Construction Grammar that combines constructions with the vector representations used in Distributional Semantics. We introduce a new framework, Distributional Construction Grammar, where grammar and meaning are systematically modeled from language use, and finally, we discuss the kind of contributions that distributional models can provide to CxG representation from a linguistic and cognitive perspective.

2018

pdf bib
Proceedings of the Eight Workshop on Cognitive Aspects of Computational Language Learning and Processing
Marco Idiart | Alessandro Lenci | Thierry Poibeau | Aline Villavicencio
Proceedings of the Eight Workshop on Cognitive Aspects of Computational Language Learning and Processing

pdf bib
Modeling Violations of Selectional Restrictions with Distributional Semantics
Emmanuele Chersoni | Adrià Torrens Urrutia | Philippe Blache | Alessandro Lenci
Proceedings of the Workshop on Linguistic Complexity and Natural Language Processing

Distributional Semantic Models have been successfully used for modeling selectional preferences in a variety of scenarios, since distributional similarity naturally provides an estimate of the degree to which an argument satisfies the requirement of a given predicate. However, we argue that the performance of such models on rare verb-argument combinations has received relatively little attention: it is not clear whether they are able to distinguish the combinations that are simply atypical, or implausible, from the semantically anomalous ones, and in particular, they have never been tested on the task of modeling their differences in processing complexity. In this paper, we compare two different models of thematic fit by testing their ability of identifying violations of selectional restrictions in two datasets from the experimental studies.

pdf bib
SemEval-2018 Task 10: Capturing Discriminative Attributes
Alicia Krebs | Alessandro Lenci | Denis Paperno
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes the SemEval 2018 Task 10 on Capturing Discriminative Attributes. Participants were asked to identify whether an attribute could help discriminate between two concepts. For example, a successful system should determine that ‘urine’ is a discriminating feature in the word pair ‘kidney’, ‘bone’. The aim of the task is to better evaluate the capabilities of state of the art semantic models, beyond pure semantic similarity. The task attracted submissions from 21 teams, and the best system achieved a 0.75 F1 score.

pdf bib
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics
Malvina Nissim | Jonathan Berant | Alessandro Lenci
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

2017

pdf bib
UDLex: Towards Cross-language Subcategorization Lexicons
Giulia Rambelli | Alessandro Lenci | Thierry Poibeau
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

pdf bib
Is Structure Necessary for Modeling Argument Expectations in Distributional Semantics?
Emmanuele Chersoni | Enrico Santus | Philippe Blache | Alessandro Lenci
IWCS 2017 - 12th International Conference on Computational Semantics - Long papers

pdf bib
Logical Metonymy in a Distributional Model of Sentence Comprehension
Emmanuele Chersoni | Alessandro Lenci | Philippe Blache
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

In theoretical linguistics, logical metonymy is defined as the combination of an event-subcategorizing verb with an entity-denoting direct object (e.g., The author began the book), so that the interpretation of the VP requires the retrieval of a covert event (e.g., writing). Psycholinguistic studies have revealed extra processing costs for logical metonymy, a phenomenon generally explained with the introduction of new semantic structure. In this paper, we present a general distributional model for sentence comprehension inspired by the Memory, Unification and Control model by Hagoort (2013,2016). We show that our distributional framework can account for the extra processing costs of logical metonymy and can identify the covert event in a classification task.

pdf bib
Measuring Thematic Fit with Distributional Feature Overlap
Enrico Santus | Emmanuele Chersoni | Alessandro Lenci | Philippe Blache
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper, we introduce a new distributional method for modeling predicate-argument thematic fit judgments. We use a syntax-based DSM to build a prototypical representation of verb-specific roles: for every verb, we extract the most salient second order contexts for each of its roles (i.e. the most salient dimensions of typical role fillers), and then we compute thematic fit as a weighted overlap between the top features of candidate fillers and role prototypes. Our experiments show that our method consistently outperforms a baseline re-implementing a state-of-the-art system, and achieves better or comparable results to those reported in the literature for the other unsupervised systems. Moreover, it provides an explicit representation of the features characterizing verb-specific semantic roles.

2016

pdf bib
The Effects of Data Size and Frequency Range on Distributional Semantic Models
Magnus Sahlgren | Alessandro Lenci
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Representing Verbs with Rich Contexts: an Evaluation on Verb Similarity
Emmanuele Chersoni | Enrico Santus | Alessandro Lenci | Philippe Blache | Chu-Ren Huang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
LexFr: Adapting the LexIt Framework to Build a Corpus-based French Subcategorization Lexicon
Giulia Rambelli | Gianluca Lebani | Laurent Prévot | Alessandro Lenci
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper introduces LexFr, a corpus-based French lexical resource built by adapting the framework LexIt, originally developed to describe the combinatorial potential of Italian predicates. As in the original framework, the behavior of a group of target predicates is characterized by a series of syntactic (i.e., subcategorization frames) and semantic (i.e., selectional preferences) statistical information (a.k.a. distributional profiles) whose extraction process is mostly unsupervised. The first release of LexFr includes information for 2,493 verbs, 7,939 nouns and 2,628 adjectives. In these pages we describe the adaptation process and evaluated the final resource by comparing the information collected for 20 test verbs against the information available in a gold standard dictionary. In the best performing setting, we obtained 0.74 precision, 0.66 recall and 0.70 F-measure.

pdf bib
Evaluating Context Selection Strategies to Build Emotive Vector Space Models
Lucia C. Passaro | Alessandro Lenci
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we compare different context selection approaches to improve the creation of Emotive Vector Space Models (VSMs). The system is based on the results of an existing approach that showed the possibility to create and update VSMs by exploiting crowdsourcing and human annotation. Here, we introduce a method to manipulate the contexts of the VSMs under the assumption that the emotive connotation of a target word is a function of both its syntagmatic and paradigmatic association with the various emotions. To study the differences among the proposed spaces and to confirm the reliability of the system, we report on two experiments: in the first one we validated the best candidates extracted from each model, and in the second one we compared the models’ performance on a random sample of target words. Both experiments have been implemented as crowdsourcing tasks.

pdf bib
Italian VerbNet: A Construction-based Approach to Italian Verb Classification
Lucia Busso | Alessandro Lenci
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper proposes a new method for Italian verb classification -and a preliminary example of resulting classes- inspired by Levin (1993) and VerbNet (Kipper-Schuler, 2005), yet partially independent from these resources; we achieved such a result by integrating Levin and VerbNet’s models of classification with other theoretic frameworks and resources. The classification is rooted in the constructionist framework (Goldberg, 1995; 2006) and is distribution-based. It is also semantically characterized by a link to FrameNet’ssemanticframesto represent the event expressed by a class. However, the new Italian classes maintain the hierarchic “tree” structure and monotonic nature of VerbNet’s classes, and, where possible, the original names (e.g.: Verbs of Killing, Verbs of Putting, etc.). We therefore propose here a taxonomy compatible with VerbNet but at the same time adapted to Italian syntax and semantics. It also addresses a number of problems intrinsic to the original classifications, such as the role of argument alternations, here regarded simply as epiphenomena, consistently with the constructionist approach.

pdf bib
Nine Features in a Random Forest to Learn Taxonomical Semantic Relations
Enrico Santus | Alessandro Lenci | Tin-Shing Chiu | Qin Lu | Chu-Ren Huang
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

ROOT9 is a supervised system for the classification of hypernyms, co-hyponyms and random words that is derived from the already introduced ROOT13 (Santus et al., 2016). It relies on a Random Forest algorithm and nine unsupervised corpus-based features. We evaluate it with a 10-fold cross validation on 9,600 pairs, equally distributed among the three classes and involving several Parts-Of-Speech (i.e. adjectives, nouns and verbs). When all the classes are present, ROOT9 achieves an F1 score of 90.7%, against a baseline of 57.2% (vector cosine). When the classification is binary, ROOT9 achieves the following results against the baseline. hypernyms-co-hyponyms 95.7% vs. 69.8%, hypernyms-random 91.8% vs. 64.1% and co-hyponyms-random 97.8% vs. 79.4%. In order to compare the performance with the state-of-the-art, we have also evaluated ROOT9 in subsets of the Weeds et al. (2014) datasets, proving that it is in fact competitive. Finally, we investigated whether the system learns the semantic relation or it simply learns the prototypical hypernyms, as claimed by Levy et al. (2015). The second possibility seems to be the most likely, even though ROOT9 can be trained on negative examples (i.e., switched hypernyms) to drastically reduce this bias.

pdf bib
What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets
Enrico Santus | Alessandro Lenci | Tin-Shing Chiu | Qin Lu | Chu-Ren Huang
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we claim that Vector Cosine ― which is generally considered one of the most efficient unsupervised measures for identifying word similarity in Vector Space Models ― can be outperformed by a completely unsupervised measure that evaluates the extent of the intersection among the most associated contexts of two target words, weighting such intersection according to the rank of the shared contexts in the dependency ranked lists. This claim comes from the hypothesis that similar words do not simply occur in similar contexts, but they share a larger portion of their most relevant contexts compared to other related words. To prove it, we describe and evaluate APSyn, a variant of Average Precision that ― independently of the adopted parameters ― outperforms the Vector Cosine and the co-occurrence on the ESL and TOEFL test sets. In the best setting, APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy in the TOEFL dataset, beating therefore the non-English US college applicants (whose average, as reported in the literature, is 64.50%) and several state-of-the-art approaches.

pdf bib
Lexical Variability and Compositionality: Investigating Idiomaticity with Distributional Semantic Models
Marco Silvio Giuseppe Senaldi | Gianluca E. Lebani | Alessandro Lenci
Proceedings of the 12th Workshop on Multiword Expressions

pdf bib
Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning
Anna Korhonen | Alessandro Lenci | Brian Murphy | Thierry Poibeau | Aline Villavicencio
Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning

pdf bib
Towards a Distributional Model of Semantic Complexity
Emmanuele Chersoni | Philippe Blache | Alessandro Lenci
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

In this paper, we introduce for the first time a Distributional Model for computing semantic complexity, inspired by the general principles of the Memory, Unification and Control framework(Hagoort, 2013; Hagoort, 2016). We argue that sentence comprehension is an incremental process driven by the goal of constructing a coherent representation of the event represented by the sentence. The composition cost of a sentence depends on the semantic coherence of the event being constructed and on the activation degree of the linguistic constructions. We also report the results of a first evaluation of the model on the Bicknell dataset (Bicknell et al., 2010).

pdf bib
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)
Michael Zock | Alessandro Lenci | Stefan Evert
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

pdf bib
“Beware the Jabberwock, dear reader!” Testing the distributional reality of construction semantics
Gianluca Lebani | Alessandro Lenci
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

Notwithstanding the success of the notion of construction, the computational tradition still lacks a way to represent the semantic content of these linguistic entities. Here we present a simple corpus-based model implementing the idea that the meaning of a syntactic construction is intimately related to the semantics of its typical verbs. It is a two-step process, that starts by identifying the typical verbs occurring with a given syntactic construction and building their distributional vectors. We then calculated the weighted centroid of these vectors in order to derive the distributional signature of a construction. In order to assess the goodness of our approach, we replicated the priming effect described by Johnson and Golberg (2013) as a function of the semantic distance between a construction and its prototypical verbs. Additional support for our view comes from a regression analysis showing that our distributional information can be used to model behavioral data collected with a crowdsourced elicitation experiment.

pdf bib
The CogALex-V Shared Task on the Corpus-Based Identification of Semantic Relations
Enrico Santus | Anna Gladkova | Stefan Evert | Alessandro Lenci
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

The shared task of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex-V) aims at providing a common benchmark for testing current corpus-based methods for the identification of lexical semantic relations (synonymy, antonymy, hypernymy, part-whole meronymy) and at gaining a better understanding of their respective strengths and weaknesses. The shared task uses a challenging dataset extracted from EVALution 1.0, which contains word pairs holding the above-mentioned relations as well as semantically unrelated control items (random). The task is split into two subtasks: (i) identification of related word pairs vs. unrelated ones; (ii) classification of the word pairs according to their semantic relation. This paper describes the subtasks, the dataset, the evaluation metrics, the seven participating systems and their results. The best performing system in subtask 1 is GHHH (F1 = 0.790), while the best system in subtask 2 is LexNet (F1 = 0.445). The dataset and the task description are available at https://sites.google.com/site/cogalex2016/home/shared-task.

pdf bib
Antonymy and Canonicity: Experimental and Distributional Evidence
Andreana Pastena | Alessandro Lenci
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

The present paper investigates the phenomenon of antonym canonicity by providing new behavioural and distributional evidence on Italian adjectives. Previous studies have showed that some pairs of antonyms are perceived to be better examples of opposition than others, and are so considered representative of the whole category (e.g., Deese, 1964; Murphy, 2003; Paradis et al., 2009). Our goal is to further investigate why such canonical pairs (Murphy, 2003) exist and how they come to be associated. In the literature, two different approaches have dealt with this issue. The lexical-categorical approach (Charles and Miller, 1989; Justeson and Katz, 1991) finds the cause of canonicity in the high co-occurrence frequency of the two adjectives. The cognitive-prototype approach (Paradis et al., 2009; Jones et al., 2012) instead claims that two adjectives form a canonical pair because they are aligned along a simple and salient dimension. Our empirical evidence, while supporting the latter view, shows that the paradigmatic distributional properties of adjectives can also contribute to explain the phenomenon of canonicity, providing a corpus-based correlate of the cognitive notion of salience.

pdf bib
Testing APSyn against Vector Cosine on Similarity Estimation
Enrico Santus | Emmanuele Chersoni | Alessandro Lenci | Chu-Ren Huang | Philippe Blache
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers

2015

pdf bib
Proceedings of the Sixth Workshop on Cognitive Aspects of Computational Language Learning
Robert Berwick | Anna Korhonen | Alessandro Lenci | Thierry Poibeau | Aline Villavicencio
Proceedings of the Sixth Workshop on Cognitive Aspects of Computational Language Learning

pdf bib
EVALution 1.0: an Evolving Semantic Dataset for Training and Evaluation of Distributional Semantic Models
Enrico Santus | Frances Yung | Alessandro Lenci | Chu-Ren Huang
Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications

2014

pdf bib
Chasing Hypernyms in Vector Spaces with Entropy
Enrico Santus | Alessandro Lenci | Qin Lu | Sabine Schulte im Walde
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
Taking Antonymy Mask off in Vector Space
Enrico Santus | Qin Lu | Alessandro Lenci | Chu-Ren Huang
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

pdf bib
The PAISÀ Corpus of Italian Web Texts
Verena Lyding | Egon Stemle | Claudia Borghetti | Marco Brunello | Sara Castagnoli | Felice Dell’Orletta | Henrik Dittmann | Alessandro Lenci | Vito Pirrelli
Proceedings of the 9th Web as Corpus Workshop (WaC-9)

pdf bib
Proceedings of the 5th Workshop on Cognitive Aspects of Computational Language Learning (CogACLL)
Alessandro Lenci | Muntsa Padró | Thierry Poibeau | Aline Villavicencio
Proceedings of the 5th Workshop on Cognitive Aspects of Computational Language Learning (CogACLL)

pdf bib
Crowdsourcing for the identification of event nominals: an experiment
Rachele Sprugnoli | Alessandro Lenci
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents the design and results of a crowdsourcing experiment on the recognition of Italian event nominals. The aim of the experiment was to assess the feasibility of crowdsourcing methods for a complex semantic task such as distinguishing the eventive interpretation of polysemous nominals taking into consideration various types of syntagmatic cues. Details on the theoretical background and on the experiment set up are provided together with the final results in terms of accuracy and inter-annotator agreement. These results are compared with the ones obtained by expert annotators on the same task. The low values in accuracy and Fleiss’ kappa of the crowdsourcing experiment demonstrate that crowdsourcing is not always optimal for complex linguistic tasks. On the other hand, the use of non-expert contributors allows to understand what are the most ambiguous patterns of polysemy and the most useful syntagmatic cues to be used to identify the eventive reading of nominals.

pdf bib
Bootstrapping an Italian VerbNet: data-driven analysis of verb alternations
Gianluca Lebani | Veronica Viola | Alessandro Lenci
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The goal of this paper is to propose a classification of the syntactic alternations admitted by the most frequent Italian verbs. The data-driven two-steps procedure exploited and the structure of the identified classes of alternations are presented in depth and discussed. Even if this classification has been developed with a practical application in mind, namely the semi-automatic building of a VerbNet-like lexicon for Italian verbs, partly following the methodology proposed in the context of the VerbNet project, its availability may have a positive impact on several related research topics and Natural Language Processing tasks

pdf bib
Choosing which to use? A study of distributional models for nominal lexical semantic classification
Lauren Romeo | Gianluca Lebani | Núria Bel | Alessandro Lenci
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper empirically evaluates the performances of different state-of-the-art distributional models in a nominal lexical semantic classification task. We consider models that exploit various types of distributional features, which thereby provide different representations of nominal behavior in context. The experiments presented in this work demonstrate the advantages and disadvantages of each model considered. This analysis also considers a combined strategy that we found to be capable of leveraging the bottlenecks of each model, especially when large robust data is not available.

2013

pdf bib
Fitting, Not Clashing! A Distributional Semantic Model of Logical Metonymy
Alessandra Zarcone | Alessandro Lenci | Sebastian Padó | Jason Utt
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Short Papers

pdf bib
The Curious Case of Metonymic Verbs: A Distributional Characterization
Jason Utt | Alessandro Lenci | Sebastian Padó | Alessandra Zarcone
Proceedings of the IWCS 2013 Workshop Towards a Formal Distributional Semantics

2012

pdf bib
Identifying hypernyms in distributional semantic spaces
Alessandro Lenci | Giulia Benotto
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Unseen features. Collecting semantic data from congenital blind subjects
Alessandro Lenci | Marco Baroni | Giovanna Marotta
Proceedings of the Workshop on Computational Models of Language Acquisition and Loss

pdf bib
LexIt: A Computational Resource on Italian Argument Structure
Alessandro Lenci | Gabriella Lapesa | Giulia Bonansinga
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The aim of this paper is to introduce LexIt, a computational framework for the automatic acquisition and exploration of distributional information about Italian verbs, nouns and adjectives, freely available through a web interface at the address http://sesia.humnet.unipi.it/lexit. LexIt is the first large-scale resource for Italian in which subcategorization and semantic selection properties are characterized fully on distributional ground: in the paper we describe both the process of data extraction and the evaluation of the subcategorization frames extracted with LexIt.

pdf bib
Enriching the ISST-TANL Corpus with Semantic Frames
Alessandro Lenci | Simonetta Montemagni | Giulia Venturi | Maria Grazia Cutrullà
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The paper describes the design and the results of a manual annotation methodology devoted to enrich the ISST--TANL Corpus, derived from the Italian Syntactic--Semantic Treebank (ISST), with Semantic Frames information. The main issues encountered in applying the English FrameNet annotation criteria to a corpus of Italian language are discussed together with the choice of anchoring the semantic annotation layer to the underlying dependency syntactic structure. The results of a case study aimed at extending and specialising this methodology for the annotation of a corpus of legislative texts are also discussed.

2011

pdf bib
Composing and Updating Verb Argument Expectations: A Distributional Semantic Model
Alessandro Lenci
Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics

pdf bib
How we BLESSed distributional semantic evaluation
Marco Baroni | Alessandro Lenci
Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics

2010

pdf bib
Comparing the Influence of Different Treebank Annotations on Dependency Parsing
Cristina Bosco | Simonetta Montemagni | Alessandro Mazzei | Vincenzo Lombardo | Felice Dell’Orletta | Alessandro Lenci | Leonardo Lesmo | Giuseppe Attardi | Maria Simi | Alberto Lavelli | Johan Hall | Jens Nilsson | Joakim Nivre
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

As the interest of the NLP community grows to develop several treebanks also for languages other than English, we observe efforts towards evaluating the impact of different annotation strategies used to represent particular languages or with reference to particular tasks. This paper contributes to the debate on the influence of resources used for the training and development on the performance of parsing systems. It presents a comparative analysis of the results achieved by three different dependency parsers developed and tested with respect to two treebanks for the Italian language, namely TUT and ISST--TANL, which differ significantly at the level of both corpus composition and adopted dependency representations.

pdf bib
A Resource and Tool for Super-sense Tagging of Italian Texts
Giuseppe Attardi | Stefano Dei Rossi | Giulia Di Pietro | Alessandro Lenci | Simonetta Montemagni | Maria Simi
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

A SuperSense Tagger is a tool for the automatic analysis of texts that associates to each noun, verb, adjective and adverb a semantic category within a general taxonomy. The developed tagger, based on a statistical model (Maximum Entropy), required the creation of an Italian annotated corpus, to be used as a training set, and the improvement of various existing tools. The obtained results significantly improved the current state-of-the art for this particular task.

pdf bib
Building an Italian FrameNet through Semi-automatic Corpus Analysis
Alessandro Lenci | Martina Johnson | Gabriella Lapesa
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

n this paper, we outline the methodology we adopted to develop a FrameNet for Italian. The main element of novelty with respect to the original FrameNet is represented by the fact that the creation and annotation of Lexical Units is strictly grounded in distributional information (statistical distribution of verbal subcategorization frames, lexical and semantic preferences of each frame) automatically acquired from a large, dependency-parsed corpus. We claim that this approach allows us to overcome some of the shortcomings of the classical lexicographic method used to create FrameNet, by complementing the accuracy of manual annotation with the robustness of data on the global distributional patterns of a verb. In the paper, we describe our method for extracting distributional data from the corpus and the way we used it for the encoding and annotation of LUs. The long-term goal of our project is to create an electronic lexicon for Italian similar to the original English FrameNet. For the moment, we have developed a database of syntactic valences that will be made freely accessible via a web interface. This represents an autonomous resource besides the FrameNet lexicon, of which we have a beginning nucleus consisting of 791 annotated sentences.

pdf bib
BabyExp: Constructing a Huge Multimodal Resource to Acquire Commonsense Knowledge Like Children Do
Massimo Poesio | Marco Baroni | Oswald Lanz | Alessandro Lenci | Alexandros Potamianos | Hinrich Schütze | Sabine Schulte im Walde | Luca Surian
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

There is by now widespread agreement that the most realistic way to construct the large-scale commonsense knowledge repositories required by natural language and artificial intelligence applications is by letting machines learn such knowledge from large quantities of data, like humans do. A lot of attention has consequently been paid to the development of increasingly sophisticated machine learning algorithms for knowledge extraction. However, the nature of the input that humans are exposed to while learning commonsense knowledge has received much less attention. The BabyExp project is collecting very dense audio and video recordings of the first 3 years of life of a baby. The corpus constructed in this way will be transcribed with automated techniques and made available to the research community. Moreover, techniques to extract commonsense conceptual knowledge incrementally from these multimodal data are also being explored within the project. The current paper describes BabyExp in general, and presents pilot studies on the feasibility of the automated audio and video transcriptions.

pdf bib
Distributional Memory: A General Framework for Corpus-Based Semantics
Marco Baroni | Alessandro Lenci
Computational Linguistics, Volume 36, Issue 4 - December 2010

2009

pdf bib
One Distributional Memory, Many Semantic Spaces
Marco Baroni | Alessandro Lenci
Proceedings of the Workshop on Geometrical Models of Natural Language Semantics

2008

pdf bib
Computational Models for Event Type Classification in Context
Alessandra Zarcone | Alessandro Lenci
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Verb lexical semantic properties are only one of the factors that contribute to the determination of the event type expressed by a sentence, which is instead the result of a complex interplay between the verb meaning and its linguistic context. We report on two computational models for the automatic identification of event type in Italian. Both models use linguistically-motivated features extracted from Italian corpora. The main goal of our experiments is to evaluate the contribution of different types of linguistic indicators to identify the event type of a sentence, as well as to model various cases of context-driven event type shift. In the first model, event type identification has been modelled as a supervised classification task, performed with Maximum Entropy classifiers. In the second model, Self-Organizing Maps have been used to define and identify event types in an unsupervised way. The interaction of various contextual factors in determining the event type expressed by a sentence makes event type identification a highly challenging task. Computational models can help us to shed new light on the real structure of event type classes as well as to gain a better understanding of context-driven semantic shifts.

pdf bib
Unsupervised Acquisition of Verb Subcategorization Frames from Shallow-Parsed Corpora
Alessandro Lenci | Barbara McGillivray | Simonetta Montemagni | Vito Pirrelli
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we reported experiments of unsupervised automatic acquisition of Italian and English verb subcategorization frames (SCFs) from general and domain corpora. The proposed technique operates on syntactically shallow-parsed corpora on the basis of a limited number of search heuristics not relying on any previous lexico-syntactic knowledge about SCFs. Although preliminary, reported results are in line with state-of-the-art lexical acquisition systems. The issue of whether verbs sharing similar SCFs distributions happen to share similar semantic properties as well was also explored by clustering verbs that share frames with the same distribution using the Minimum Description Length Principle (MDL). First experiments in this direction were carried out on Italian verbs with encouraging results.

2007

pdf bib
ISA meets Lara: An incremental word space model for cognitively plausible simulations of semantic learning
Marco Baroni | Alessandro Lenci | Luca Onnis
Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition

2006

pdf bib
Probing the Space of Grammatical Variation: Induction of Cross-Lingual Grammatical Constraints from Treebanks
Felice Dell’Orletta | Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli
Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006

pdf bib
Searching treebanks for functional constraints: cross-lingual experiments in grammatical relation assignment
Felice Dell’Orletta | Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The paper reports on a detailed quantitative analysis of distributional language data of both Italian and Czech, highlighting the relative contribution of a number of distributed grammatical factors to sentence-based identification of subjects and direct objects. The work is based on a Maximum Entropy model of stochastic resolution of grammatical conflicting constraints, and is demonstrably capable of putting explanatory theoretical accounts to the challenging test of an extensive, usage-based empirical verification.

pdf bib
Creation and Use of Lexicons and Ontologies for NL Interfaces to Databases
Roberto Bartolini | Caterina Caracciolo | Emiliano Giovanetti | Alessandro Lenci | Simone Marchi | Vito Pirrelli | Chiara Renso | Laura Spinsanti
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we present an original approach to natural language query interpretation which has been implemented withinthe FuLL (Fuzzy Logic and Language) Italian project of BC S.r.l. In particular, we discuss here the creation of linguisticand ontological resources, together with the exploitation of existing ones, for natural language-driven database access andretrieval. Both the database and the queries we experiment with are Italian, but the methodology we broach naturally extends to other languages.

2005

pdf bib
Climbing the Path to Grammar: A Maximum Entropy Model of Subject/Object Learning
Felice Dell’Orletta | Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli
Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition

2004

pdf bib
Towards a Language Infrastructure for the Semantic Web
Thierry Declerck | Paul Buitelaar | Nicoletta Calzolari | Alessandro Lenci
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
ENABLER Thematic Network of National Projects: Technical, Strategic and Political Issues of LRs
Nicoletta Calzolari | Khalid Choukri | Maria Gavrilidou | Bente Maegaard | Paola Baroni | Hanne Fersøe | Alessandro Lenci | Valérie Mapelli | Monica Monachini | Stelios Piperidis
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Semantic Mark-up of Italian Legal Texts Through NLP-based Techniques
Roberto Bartolini | Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli | Claudia Soria
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Hybrid Constraints for Robust Parsing: First Experiments and Evaluation
Roberto Bartolini | Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Content Interoperability of Lexical Resources: Open Issues and “MILE” Perspectives
Francesca Bertagna | Alessandro Lenci | Monica Monachini | Nicoletta Calzolari
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
RDF Instantiation of ISLE/MILE Lexical Entries
Nancy Ide | Alessandro Lenci | Nicoletta Calzolari
Proceedings of the ACL 2003 Workshop on Linguistic Annotation: Getting the Model Right

2002

pdf bib
From Resources to Applications. Designing the Multilingual ISLE Lexical Entry
Sue Atkins | Nuria Bel | Francesca Bertagna | Pierrette Bouillon | Nicoletta Calzolari | Christiane Fellbaum | Ralph Grishman | Alessandro Lenci | Catherine MacLeod | Martha Palmer | Gregor Thurmair | Marta Villegas | Antonio Zampolli
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Towards Best Practice for Multiword Expressions in Computational Lexicons
Nicoletta Calzolari | Charles J. Fillmore | Ralph Grishman | Nancy Ide | Alessandro Lenci | Catherine MacLeod | Antonio Zampolli
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Multilingual Summarization by Integrating Linguistic Resources in the MLIS-MUSI Project
Alessandro Lenci | Roberto Bartolini | Nicoletta Calzolari | Ana Agua | Stephan Busemann | Emmanuel Cartier | Karine Chevreau | José Coch
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
The Lexicon-Grammar Balance in Robust Parsing of Italian
Roberto Bartolini | Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Broadening the Scope of the EAGLES/ISLE Lexical Standardization Initiative
Nicoletta Calzolari | Alessandro Lenci | Francesca Bertagna | Antonio Zampolli
COLING-02: The 3rd Workshop on Asian Language Resources and International Standardization

pdf bib
Grammar and Lexicon in the Robust Parsing of Italian towards a Non-Naïve Interplay
Roberto Bartolini | Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli
COLING-02: Grammar Engineering and Evaluation

2001

pdf bib
The ISLE in the ocean. Transatlantic standards for multilingual lexicons (with an eye to machine translation)
Nicoletta Calzolari | Alessandro Lenci | Antonio Zampolli | Nuria Bel | Marta Villegas | Gregor Thurmair
Proceedings of Machine Translation Summit VIII

The ISLE project is a continuation of the long standing EAGLES initiative, carried out under the Human Language Technology (HLT) programme in collaboration between American and European groups in the framework of the EU-US International Research Co-operation, supported by NSF and EC. In this paper we concentrate on the current position of the ISLE Computational Lexicon Working Group (CLWG), whose activities aim at defining a general schema for a multilingual lexical entry (MILE), as the basis for a standard framework for multilingual computational lexicons. The needs and features of existing Machine Translation systems provide the main reference points for the process of consensual definition of the MILE. The overall structure of the MILE will be illustrated with particular attention to some of the issues raised for multilingual lexicons by the need of expressing complex transfer conditions among translation equivalents

pdf bib
International Standards for Multilingual Resource Sharing: The ISLE Computational Lexicon Working Group
Nicoletta Calzolari | Alessandro Lenci | Antonio Zampolli
Proceedings of the ACL 2001 Workshop on Sharing Tools and Resources

2000

pdf bib
SIMPLE: A General Framework for the Development of Multilingual Lexicons
Nuria Bel | Federica Busa | Nicoletta Calzolari | Elisabetta Gola | Alessandro Lenci | Monica Monachini | Antoine Ogonowski | Ivonne Peters | Wim Peters | Nilda Ruimy | Marta Villegas | Antonio Zampolli
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Multilingual Linguistic Resources: From Monolingual Lexicons to Bilingual Interrelated Lexicons
Marta Villegas | Nuria Bel | Alessandro Lenci | Nicoletta Calzolari | Nilda Ruimy | Antonio Zampolli | Teresa Sadurní | Joan Soler
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Where Opposites Meet. A Syntactic Meta-scheme for Corpus Annotation and Parsing Evaluation
Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli | Claudia Soria
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1999

pdf bib
FAME: a Functional Annotation Meta-scheme for multi-modal and multi-lingual Parsing Evaluation
Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli | Claudia Soria
Computer Mediated Language Assessment and Evaluation in Natural Language Processing

Search
Co-authors