Luca Dini

Also published as: L. Dini


pdf bib
TEXT-CAKE: Challenging Language Models on Local Text Coherence
Luca Dini | Dominique Brunato | Felice Dell’Orletta | Tommaso Caselli
Proceedings of the 31st International Conference on Computational Linguistics

We present a deep investigation of encoder-based Language Models (LMs) on their abilities to detect text coherence across four languages and four text genres using a new evaluation benchmark, TEXT-CAKE. We analyze both multilingual and monolingual LMs with varying architectures and parameters in different finetuning settings. Our findings demonstrate that identifying subtle perturbations that disrupt local coherence is still a challenging task. Furthermore, our results underline the importance of using diverse text genres during pre-training and of an optimal pre-traning objective and large vocabulary size. When controlling for other parameters, deep LMs (i.e., higher number of layers) have an advantage over shallow ones, even when the total number of parameters is smaller.


pdf bib
Emotion Analysis on Twitter: The Hidden Challenge
Luca Dini | André Bittar
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we present an experiment to detect emotions in tweets. Unlike much previous research, we draw the important distinction between the tasks of emotion detection in a closed world assumption (i.e. every tweet is emotional) and the complicated task of identifying emotional versus non-emotional tweets. Given an apparent lack of appropriately annotated data, we created two corpora for these tasks. We describe two systems, one symbolic and one based on machine learning, which we evaluated on our datasets. Our evaluation shows that a machine learning classifier performs best on emotion detection, while a symbolic approach is better for identifying relevant (i.e. emotional) tweets.


pdf bib
The Dangerous Myth of the Star System
André Bittar | Luca Dini | Sigrid Maurel | Mathieu Ruhlmann
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In recent years we have observed two parallel trends in computational linguistics research and e-commerce development. On the research side, there has been an increasing interest in algorithms and approaches that are able to capture the polarity of opinions expressed by users on products, institutions and services. On the other hand, almost all big e-commerce and aggregator sites are by now providing users the possibility of writing comments and expressing their appreciation with a numeric score (usually represented as a number of stars). This generates the impression that the work carried out in the research community is made partially useless (at least for economic exploitation) by an evolution in web practices. In this paper we describe an experiment on a large corpus which shows that the score judgments provided by users are often conflicting with the text contained in the opinion, and to such a point that a rule-based opinion mining system can be demonstrated to perform better than the users themselves in ranking their opinions.

pdf bib
Generating a Resource for Products and Brandnames Recognition. Application to the Cosmetic Domain.
Cédric Lopez | Frédérique Segond | Olivier Hondermarck | Paolo Curtoni | Luca Dini
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Named Entity Recognition task needs high-quality and large-scale resources. In this paper, we present RENCO, a based-rules system focused on the recognition of entities in the Cosmetic domain (brandnames, product names, …). RENCO has two main objectives: 1) Generating resources for named entity recognition; 2) Mining new named entities relying on the previous generated resources. In order to build lexical resources for the cosmetic domain, we propose a system based on local lexico-syntactic rules complemented by a learning module. As the outcome of the system, we generate both a simple lexicon and a structured lexicon. Results of the evaluation show that even if RENCO outperforms a classic Conditional Random Fields algorithm, both systems should combine their respective strengths.


pdf bib
Celi: EDITS and Generic Text Pair Classification
Milen Kouylekov | Luca Dini | Alessio Bosca | Marco Trevisan
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)


pdf bib
Query log analysis with LangLog
Marco Trevisan | Eduard Barbu | Igor Barsanti | Luca Dini | Nikolaos Lagos | Frédérique Segond | Mathieu Rhulmann | Ed Vald
Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Linguagrid: a network of Linguistic and Semantic Services for the Italian Language.
Alessio Bosca | Luca Dini | Milen Kouylekov | Marco Trevisan
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In order to handle the increasing amount of textual information today available on the web and exploit the knowledge latent in this mass of unstructured data, a wide variety of linguistic knowledge and resources (Language Identification, Morphological Analysis, Entity Extraction, etc.). is crucial. In the last decade LRaas (Language Resource as a Service) emerged as a novel paradigm for publishing and sharing these heterogeneous software resources over the Web. In this paper we present an overview of Linguagrid, a recent initiative that implements an open network of linguistic and semantic Web Services for the Italian language, as well as a new approach for enabling customizable corpus-based linguistic services on Linguagrid LRaaS infrastructure. A corpus ingestion service in fact allows users to upload corpora of documents and to generate classification/clustering models tailored to their needs by means of standard machine learning techniques applied to the textual contents and metadata from the corpora. The models so generated can then be accessed through proper Web Services and exploited to process and classify new textual contents.

pdf bib
CELI: An Experiment with Cross Language Textual Entailment
Milen Kouylekov | Luca Dini | Alessio Bosca | Marco Trevisan
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)


pdf bib
The Impact of Grammar Enhancement on Semantic Resources Induction
Luca Dini | Giampaolo Mazzini
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper describes the effects of the evolution of an Italian dependency grammar on a task of multilingual FrameNet acquisition. The task is based on the creation of virtual English/Italian parallel annotation corpora, which are then aligned at dependency level by using two manually encoded grammar based dependency parsers. We show how the evolution of the LAS (Labeled Attachment Score) metric for the considered grammar has a direct impact on the quality of the induced FrameNet, thus proving that the evolution of the quality of syntactic resources is mirrored by an analogous evolution in semantic ones. In particular we show that an improvement of 30% in LAS causes an improvement of precision for the induced resource ranging from 5% to 10%, depending on the type of evaluation.


pdf bib
Multilingual Search in Libraries. The case-study of the Free University of Bozen-Bolzano
R. Bernardi | D. Calvanese | L. Dini | V. Di Tomaso | E. Frasnelli | U. Kugler | B. Plank
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents an on-going project aiming at enhancing the OPAC (Online Public Access Catalog) search system of the Library of the Free University of Bozen-Bolzano with multilingual access. The Multilingual search system (MUSIL), we have developed, integrates advanced linguistic technologies in a user friendly interface and bridges the gap between the world of free text search and the world of conceptual librarian search. In this paper we present the architecture of the system, its interface and preliminary evaluations of the precision of the search results.


pdf bib
SiSSA: An Infrastructure for Developing NLP Applications
Alberto Lavelli | Fabio Pianesi | Ermanno Maci | Irina Prodanof | Luca Dini | Giampaolo Mazzini
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)


pdf bib
SiSSA - An Infrastructure for NLP Application Development
Alberto Lavelli | F. Pianesi | E. Maci | I. Prodanof | L. Dini | G. Mazzini
Proceedings of the ACL 2001 Workshop on Sharing Tools and Resources


pdf bib
Error Driven Word Sense Disambiguation
Luca Dini | Vittorio Di Tomaso | Frederique Segond
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Error Driven Word Sense Disambiguation
Luca Dini | Vittorio Di Tomaso | Frederique Segond
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1


pdf bib
Natural Language Dialogue Service for Appointment Scheduling Agents
Stephan Busemann | Thierry Declerck | Abdel Kader Diagne | Luca Dini | Judith Klein | Sven Schmeier
Fifth Conference on Applied Natural Language Processing

pdf bib
Hypertextual Grammar Development
Luca Dini | Giampaolo Mazzini
Computational Environments for Grammar Development and Linguistic Engineering


pdf bib
JDII: Parsing Italian with a Robust Constraint Grammar
Andrea Bolioli | Luca Dini | Giovanni Malnati
COLING 1992 Volume 3: The 14th International Conference on Computational Linguistics