Fabrizio Sebastiani


2016

pdf bib
SemEval-2016 Task 4: Sentiment Analysis in Twitter
Preslav Nakov | Alan Ritter | Sara Rosenthal | Fabrizio Sebastiani | Veselin Stoyanov
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
QCRI at SemEval-2016 Task 4: Probabilistic Methods for Binary and Ordinal Quantification
Giovanni Da San Martino | Wei Gao | Fabrizio Sebastiani
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
The Challenge of Sentiment Quantification
Fabrizio Sebastiani
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

2014

bib
Text Quantification
Fabrizio Sebastiani
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

In recent years it has been pointed out that, in a number of applications involving (text) classification, the final goal is not determining which class (or classes) individual unlabelled data items belong to, but determining the prevalence (or "relative frequency") of each class in the unlabelled data. The latter task is known as quantification. Assume a market research agency runs a poll in which they ask the question "What do you think of the recent ad campaign for product X?" Once the poll is complete, they may want to classify the resulting textual answers according to whether they belong or not to the class LovedTheCampaign. The agency is likely not interested in whether a specific individual belongs to the class LovedTheCampaign, but in knowing how many respondents belong to it, i.e., in knowing the prevalence of the class. In other words, the agency is interested not in classification, but in quantification. Essentially, quantification is classification tackled at the aggregate (rather than at the individual) level. The research community has recently shown a growing interest in tackling quantification as a task in its own right. One of the reasons is that, since the goal of quantification is different than that of classification, quantification requires evaluation measures different than for classification. A second, related reason is that using a method optimized for classification accuracy is suboptimal when quantification accuracy is the real goal. A third reason is the growing awareness that quantification is going to be more and more important; with the advent of big data, more and more application contexts are going to spring up in which we will simply be happy with analyzing data at the aggregate (rather than at the individual) level. The goal of this tutorial is to introduce the audience to the problem of quantification, to the techniques that have been proposed for solving it, to the metrics used to evaluate them, and to the problems that are still open in the area.

2010

pdf bib
ISTI@SemEval-2 Task 8: Boosting-Based Multiway Relation Classification
Andrea Esuli | Diego Marcheggiani | Fabrizio Sebastiani
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining
Stefano Baccianella | Andrea Esuli | Fabrizio Sebastiani
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this work we present SENTIWORDNET 3.0, a lexical resource explicitly devised for supporting sentiment classification and opinion mining applications. SENTIWORDNET 3.0 is an improved version of SENTIWORDNET 1.0, a lexical resource publicly available for research purposes, now currently licensed to more than 300 research groups and used in a variety of research projects worldwide. Both SENTIWORDNET 1.0 and 3.0 are the result of automatically annotating all WORDNET synsets according to their degrees of positivity, negativity, and neutrality. SENTIWORDNET 1.0 and 3.0 differ (a) in the versions of WORDNET which they annotate (WORDNET 2.0 and 3.0, respectively), (b) in the algorithm used for automatically annotating WORDNET, which now includes (additionally to the previous semi-supervised learning step) a random-walk step for refining the scores. We here discuss SENTIWORDNET 3.0, especially focussing on the improvements concerning aspect (b) that it embodies with respect to version 1.0. We also report the results of evaluating SENTIWORDNET 3.0 against a fragment of WORDNET 3.0 manually annotated for positivity, negativity, and neutrality; these results indicate accuracy improvements of about 20% with respect to SENTIWORDNET 1.0.

2008

pdf bib
Annotating Expressions of Opinion and Emotion in the Italian Content Annotation Bank
Andrea Esuli | Fabrizio Sebastiani | Ilaria Urciuoli
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we describe the result of manually annotating I-CAB, the Italian Content Annotation Bank, by expressions of private state (EPSs), i.e., expressions that denote the presence of opinions, emotions, and other cognitive states. The aim of this effort was the generation of a standard resource for supporting the development of opinion extraction algorithms for Italian, and of a benchmark for testing such algorithms. To this end we have employed a previously existing annotation language (here dubbed WWC, from the initials of its proponents). We here describe the results of this annotation effort, including the results of a thorough inter-annotator agreement test. We conclude by discussing how WWC can be adapted to the specificities of a Romance language such as Italian.

2007

pdf bib
PageRanking WordNet Synsets: An Application to Opinion Mining
Andrea Esuli | Fabrizio Sebastiani
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining
Andrea Esuli | Fabrizio Sebastiani
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Opinion mining (OM) is a recent subdiscipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about, but with the opinion it expresses. OM has a rich set of applications, ranging from tracking users’ opinions about products or about political candidates as expressed in online forums, to customer relationship management. In order to aid the extraction of opinions from text, recent research has tried to automatically determine the “PNpolarity” of subjective terms, i.e. identify whether a term that is a marker of opinionated content has a positive or a negative connotation. Research on determining whether a term is indeed a marker of opinionated content (a subjective term) or not (an objective term) has been instead much scarcer. In this work we describe SENTIWORDNET, a lexical resource in which each WORDNET synset sis associated to three numerical scores Obj(s), Pos(s) and Neg(s), describing how objective, positive, and negative the terms contained in the synset are. The method used to develop SENTIWORDNET is based on the quantitative analysis of the glosses associated to synsets, and on the use of the resulting vectorial term representations for semi-supervised synset classi.cation. The three scores are derived by combining the results produced by a committee of eight ternary classi.ers, all characterized by similar accuracy levels but different classification behaviour. SENTIWORDNET is freely available for research purposes, and is endowed with a Web-based graphical user interface.

pdf bib
Determining Term Subjectivity and Term Orientation for Opinion Mining
Andrea Esuli | Fabrizio Sebastiani
11th Conference of the European Chapter of the Association for Computational Linguistics

2004

pdf bib
An Analysis of the Relative Difficulty of Reuters-21578 Subsets
Franca Debole | Fabrizio Sebastiani
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)