2022
pdf
bib
abs
Multi3Generation: Multitask, Multilingual, Multimodal Language Generation
Anabela Barreiro
|
José GC de Souza
|
Albert Gatt
|
Mehul Bhatt
|
Elena Lloret
|
Aykut Erdem
|
Dimitra Gkatzia
|
Helena Moniz
|
Irene Russo
|
Fabio Kepler
|
Iacer Calixto
|
Marcin Paprzycki
|
François Portet
|
Isabelle Augenstein
|
Mirela Alhasani
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
This paper presents the Multitask, Multilingual, Multimodal Language Generation COST Action – Multi3Generation (CA18231), an interdisciplinary network of research groups working on different aspects of language generation. This “meta-paper” will serve as reference for citations of the Action in future publications. It presents the objectives, challenges and a the links for the achieved outcomes.
pdf
bib
abs
Creative Text-to-Image Generation: Suggestions for a Benchmark
Irene Russo
Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities
Language models for text-to-image generation can output good quality images when referential aspects of pictures are evaluated. The generation of creative images is not under scrutiny at the moment, but it poses interesting challenges: should we expect more creative images using more creative prompts? What is the relationship between prompts and images in the global process of human evaluation? In this paper, we want to highlight several criteria that should be taken into account for building a creative text-to-image generation benchmark, collecting insights from multiple disciplines (e.g., linguistics, cognitive psychology, philosophy, psychology of art).
2021
pdf
bib
abs
archer at SemEval-2021 Task 1: Contextualising Lexical Complexity
Irene Russo
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Evaluating the complexity of a target word in a sentential context is the aim of the Lexical Complexity Prediction task at SemEval-2021. This paper presents the system created to assess single words lexical complexity, combining linguistic and psycholinguistic variables in a set of experiments involving random forest and XGboost regressors. Beyond encoding out-of-context information about the lemma, we implemented features based on pre-trained language models to model the target word’s in-context complexity.
2020
pdf
bib
abs
Guessing the Age of Acquisition of Italian Lemmas through Linear Regression
Irene Russo
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
The age of acquisition of a word is a psycholinguistic variable concerning the age at which a word is typically learned. It correlates with other psycholinguistic variables such as familiarity, concreteness, and imageability. Existing datasets for multiple languages also include linguistic variables such as the length and the frequency of lemmas in different corpora. There are substantial sets of normative values for English, but for other languages, such as Italian, the coverage is scarce. In this paper,a set of regression experiments investigates whether it is possible to guess the age of acquisition of Italian lemmas that have not been previously rated by humans. An intrinsic evaluation is proposed, correlating estimated Italian lemmas’ AoA with English lemmas’ AoA. An extrinsic evaluation - using AoA values as features for the classification of literary excerpts labeled by age appropriateness - shows how es-sential is lexical coverage for this task.
2018
pdf
bib
The DLDP Survey on Digital Use and Usability of EU Regional and Minority Languages
Claudia Soria
|
Valeria Quochi
|
Irene Russo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2016
pdf
bib
abs
LREC as a Graph: People and Resources in a Network
Riccardo Del Gratta
|
Francesca Frontini
|
Monica Monachini
|
Gabriella Pardelli
|
Irene Russo
|
Roberto Bartolini
|
Fahad Khan
|
Claudia Soria
|
Nicoletta Calzolari
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This proposal describes a new way to visualise resources in the LREMap, a community-built repository of language resource descriptions and uses. The LREMap is represented as a force-directed graph, where resources, papers and authors are nodes. The analysis of the visual representation of the underlying graph is used to study how the community gathers around LRs and how LRs are used in research.
pdf
bib
abs
Fostering digital representation of EU regional and minority languages: the Digital Language Diversity Project
Claudia Soria
|
Irene Russo
|
Valeria Quochi
|
Davyth Hicks
|
Antton Gurrutxaga
|
Anneli Sarhimaa
|
Matti Tuomisto
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Poor digital representation of minority languages further prevents their usability on digital media and devices. The Digital Language Diversity Project, a three-year project funded under the Erasmus+ programme, aims at addressing the problem of low digital representation of EU regional and minority languages by giving their speakers the intellectual an practical skills to create, share, and reuse online digital content. Availability of digital content and technical support to use it are essential prerequisites for the development of language-based digital applications, which in turn can boost digital usage of these languages. In this paper we introduce the project, its aims, objectives and current activities for sustaining digital usability of minority languages through adult education.
2015
pdf
bib
SemEval-2015 Task 9: CLIPEval Implicit Polarity of Events
Irene Russo
|
Tommaso Caselli
|
Carlo Strapparava
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
2014
pdf
bib
abs
From Synsets to Videos: Enriching ItalWordNet Multimodally
Roberto Bartolini
|
Valeria Quochi
|
Irene De Felice
|
Irene Russo
|
Monica Monachini
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
The paper describes the multimodal enrichment of ItalWordNet action verbs entries by means of an automatic mapping with an ontology of action types instantiated by video scenes (ImagAct). The two resources present important differences as well as interesting complementary features, such that a mapping of these two resources can lead to a an enrichment of IWN, through the connection between synsets and videos apt to illustrate the meaning described by glosses. Here, we describe an approach inspired by ontology matching methods for the automatic mapping of ImagAct video scened onto ItalWordNet sense. The experiments described in the paper are conducted on Italian, but the same methodology can be extended to other languages for which WordNets have been created, since ImagAct is done also for English, Chinese and Spanish. This source of multimodal information can be exploited to design second language learning tools, as well as for language grounding in video action recognition and potentially for robotics.
2013
pdf
bib
From Glosses to Qualia: Qualia Extraction from Senso Comune
Tommaso Caselli
|
Irene Russo
Proceedings of the 6th International Conference on Generative Approaches to the Lexicon (GL2013)
pdf
bib
Disambiguation of Basic Action Types through Nouns’ Telic Qualia
Irene Russo
|
Francesca Frontini
|
Irene De Felice
|
Fahad Khan
|
Monica Monachini
Proceedings of the 6th International Conference on Generative Approaches to the Lexicon (GL2013)
2012
pdf
bib
abs
Customizable SCF Acquisition in Italian
Tommaso Caselli
|
Francesco Rubino
|
Francesca Frontini
|
Irene Russo
|
Valeria Quochi
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Lexica of predicate-argument structures constitute a useful tool for several tasks in NLP. This paper describes a web-service system for automatic acquisition of verb subcategorization frames (SCFs) from parsed data in Italian. The system acquires SCFs in an unsupervised manner. We created two gold standards for the evaluation of the system, the first by mixing together information from two lexica (one manually created and the second automatically acquired) and manual exploration of corpus data and the other annotating data extracted from a specialized corpus (environmental domain). Data filtering is accomplished by means of the maximum likelihood estimate (MLE). The evaluation phase has allowed us to identify the best empirical MLE threshold for the creation of a lexicon (P=0.653, R=0.557, F1=0.601). In addition to this, we assigned to the extracted entries of the lexicon a confidence score based on the relative frequency and evaluated the extractor on domain specific data. The confidence score will allow the final user to easily select the entries of the lexicon in terms of their reliability: one of the most interesting feature of this work is the possibility the final users have to customize the results of the SCF extractor, obtaining different SCF lexica in terms of size and accuracy.
pdf
bib
abs
The Language Library: supporting community effort for collective resource production
Riccardo Del Gratta
|
Francesca Frontini
|
Francesco Rubino
|
Irene Russo
|
Nicoletta Calzolari
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Relations among phenomena at different linguistic levels are at the essence of language properties but today we focus mostly on one specific linguistic layer at a time, without (having the possibility of) paying attention to the relations among the different layers. At the same time our efforts are too much scattered without much possibility of exploiting other people's achievements. To address the complexities hidden in multilayer interrelations even small amounts of processed data can be useful, improving the performance of complex systems. Exploiting the current trend towards sharing we want to initiate a collective movement that works towards creating synergies and harmonisation among different annotation efforts that are now dispersed. In this paper we present the general architecture of the Language Library, an initiative which is conceived as a facility for gathering and making available through simple functionalities the linguistic knowledge the field is able to produce, putting in place new ways of collaboration within the LRT community. In order to reach this goal, a first population round of the Language Library has started around a core of parallel/comparable texts that have been annotated by several contributors submitting a paper for LREC2012. The Language Library has also an ancillary aim related to language documentation and archiving and it is conceived as a theory-neutral space which allows for several language processing philosophies to coexist.
pdf
bib
abs
The IMAGACT Cross-linguistic Ontology of Action. A new infrastructure for natural language disambiguation
Massimo Moneglia
|
Monica Monachini
|
Omar Calabrese
|
Alessandro Panunzi
|
Francesca Frontini
|
Gloria Gagliardi
|
Irene Russo
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Action verbs, which are highly frequent in speech, cause disambiguation problems that are relevant to Language Technologies. This is a consequence of the peculiar way each natural language categorizes Action i.e. it is a consequence of semantic factors. Action verbs are frequently general, since they extend productively to actions belonging to different ontological types. Moreover, each language categorizes action in its own way and therefore the cross-linguistic reference to everyday activities is puzzling. This paper briefly sketches the IMAGACT project, which aims at setting up a cross-linguistic Ontology of Action for grounding disambiguation tasks in this crucial area of the lexicon. The project derives information on the actual variation of action verbs in English and Italian from spontaneous speech corpora, where references to action are high in frequency. Crucially it makes use of the universal language of images to identify action types, avoiding the underdeterminacy of semantic definitions. Action concept entries are implemented as prototypic scenes; this will make it easier to extend the Ontology to other languages.
pdf
bib
abs
The LRE Map. Harmonising Community Descriptions of Resources
Nicoletta Calzolari
|
Riccardo Del Gratta
|
Gil Francopoulo
|
Joseph Mariani
|
Francesco Rubino
|
Irene Russo
|
Claudia Soria
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Accurate and reliable documentation of Language Resources is an undisputable need: documentation is the gateway to discovery of Language Resources, a necessary step towards promoting the data economy. Language resources that are not documented virtually do not exist: for this reason every initiative able to collect and harmonise metadata about resources represents a valuable opportunity for the NLP community. In this paper we describe the LRE Map, reporting statistics on resources associated with LREC2012 papers and providing comparisons with LREC2010 data. The LRE Map, jointly launched by FLaReNet and ELRA in conjunction with the LREC 2010 Conference, is an instrument for enhancing availability of information about resources, either new or already existing ones. It wants to reinforce and facilitate the use of standards in the community. The LRE Map web interface provides the possibility of searching according to a fixed set of metadata and to view the details of extracted resources. The LRE Map is continuing to collect bottom-up input about resources from authors of other conferences through standard submission process. This will help broadening the notion of language resources and attract to the field neighboring disciplines that so far have been only marginally involved by the standard notion of language resources.
pdf
bib
abs
Assigning Connotation Values to Events
Tommaso Caselli
|
Irene Russo
|
Francesco Rubino
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Sentiment Analysis (SA) and Opinion Mining (OM) have become a popular task in recent years in NLP with the development of language resources, corpora and annotation schemes. The possibility to discriminate between objective and subjective expressions contributes to the identification of a document's semantic orientation and to the detection of the opinions and sentiments expressed by the authors or attributed to other participants in the document. Subjectivity word sense disambiguation helps in this task, automatically determining which word senses in a corpus are being used subjectively and which are being used objectively. This paper reports on a methodology to assign in a semi-automatic way connotative values to eventive nouns usually labelled as neutral through syntagmatic patterns that express cause-effect relations between emotion cause events and emotion words. We have applied our method to nouns and we have been able reduce the number of OBJ polarity values associated to event noun.
pdf
bib
Verb interpretation for basic action types: annotation, ontology induction and creation of prototypical scenes
Francesca Frontini
|
Irene De Felice
|
Fahad Khan
|
Irene Russo
|
Monica Monachini
|
Gloria Gagliardi
|
Alessandro Panunzi
Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon
2011
pdf
bib
EMOCause: An Easy-adaptable Approach to Extract Emotion Cause Contexts
Irene Russo
|
Tommaso Caselli
|
Francesco Rubino
|
Ester Boldrini
|
Patricio Martínez-Barco
Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011)
pdf
bib
The Language Library: Many Layers, More Knowledge
Nicoletta Calzolari
|
Riccardo Del Gratta
|
Francesca Frontini
|
Irene Russo
Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm
2010
pdf
bib
abs
The LREC Map of Language Resources and Technologies
Nicoletta Calzolari
|
Claudia Soria
|
Riccardo Del Gratta
|
Sara Goggi
|
Valeria Quochi
|
Irene Russo
|
Khalid Choukri
|
Joseph Mariani
|
Stelios Piperidis
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
In this paper we present the LREC Map of Language Resources and Tools, an innovative feature introduced with this LREC. The purpose of the Map is to shed light on the vast amount of resources and tools that represent the background of the research presented at LREC, in the attempt to fill in a gap in the community knowledge about the resources and tools that are used or created worldwide. It also aims at a change of culture in the field, actively engaging each researcher in the documentation task about resources. The Map has been developed on the basis of the information provided by LREC authors during the submission of papers to the LREC 2010 conference and the LREC workshops, and contains information about almost 2000 resources. The paper illustrates the motivation behind this initiative, its main characteristics, its relevance and future impact in the field, the metadata used to describe the resources, and finally presents some of the most relevant findings.
pdf
bib
abs
Discovering Polarity for Ambiguous and Objective Adjectives through Adverbial Modification
Irene Russo
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
The field of opinion mining has emerged in recent years as an exciting challenge for computational linguistics: investigating how humans express subjective judgments through linguistic means paves the way for automatic recognition and summarization of opinionated texts, with the possibility of determining the polarities and strengths of opinions asserted. Sentiment lexicons are basic resources for investigating the orientation of a text that can be performed considering polarized words included in it but they encode the polarity of word types instead that the polarity of word tokens. The expression of an opinion through the choice of lexical items is context-sensitive and sentiment lexicons could be integrated with syntagmatic patterns that emerge as significant with statistical analyses. In this paper it will be proposed a corpus analysis of adverbially modified ambiguous (e.g. fast, rich) and objective adjectives (e.g. chemical, political) - that can be occasionally exploited to express a subjective judgments -. Comparing polarity encoded in sentiment lexicons and the results of a logistic regression analysis, the role of adverbial cues for polarity detection will be evaluated on the basis of a small sample of sentences manually annotated.