UBham: Lexical Resources and Dependency Parsing for Aspect-Based Sentiment Analysis

This paper describes the system developed by the UBham team for the SemEval-2014 Aspect-Based Sentiment Analysis task (Task 4). We present an approach based on deep linguistic processing techniques and resources, and explore the parameter space of these techniques applied to the different stages in this task and examine possibilities to exploit interdependencies between them.


Introduction
Aspect-Based Sentiment Analysis (ASBA) is concerned with detection of the author's sentiment towards different issues discussed in a document, such as aspects or features of a product in a customer review. The specific ASBA scenario we address in this paper is as follows. Given a sentence from a review, identify (1) aspect terms, specific words or multiword expressions denoting aspects of the product; (2) aspect categories, categories of issues being commented on; (3) aspect term polarity, the polarity of the sentiment associated with each aspect term; and (4) aspect category polarity, the polarity associated with each aspect category found in the sentence. For example, in: I liked the service and the staff, but not the food. aspect terms are service, staff and food, where the first two are evaluated positively and the last one negatively; and aspect categories are SERVICE and FOOD, where the former is associated with positive sentiment and the latter with negative. It should be noted that a given sentence may contain This work is licenced under a Creative Commons Attribution 4.0 International Licence. Page numbers and proceedings footer are added by the organisers. Licence details: http://creativecommons.org/licenses/by/4.0/ The research was partially supported by FP7 ICT project "Workbench for Interactive Contrastive Analysis of Patent Documentation" under grant no. FP7-SME-606163. one, several, or no aspect terms, one, several, or no aspect categories, and may express either positive, negative, neutral, or conflicted sentiment.
While the ASBA task is usually studied in the context of documents (e.g., online reviews), peculiarities of this scenario are short input texts, complex categorization schemas, and a limited amount of annotated data. Therefore we focused on ways to exploit deep linguistic processing techniques, which we use for both creating complex classification features and rule-based processing.
2 Related Work

Aspect Term Extraction
To recognize terms that express key notions in a product or service review, a common general approach has been to extract nouns and noun phrases as potential terms and then apply a certain filtering technique to ensure only the most relevant terms remain. These techniques include statistical association tests (Yi et al., 2003), associative mining rules with additional rule-based post-processing steps (Hu and Liu, 2004), and measures of association with certain pre-defined classes of words, such as part-whole relation indicators (Popescu and Etzioni, 2005).

Aspect Category Recognition
Aspect category recognition is often addressed as a text classification problem, where a classifier is learned from reviews manually tagged for aspects (e.g., Snyder andBarzilay, 2007, Ganu et al., 2009). Titov and McDonald (2008) present an approach which jointly detects aspect categories and their sentiment using a classifier trained on topics discovered via Multi-Grain LDA and star ratings available in training data. Zhai et al. (2010) presented an approach based on Expectation-Maximization to group aspect expressions into user-defined aspect categories.

Sentence Sentiment
Lexicon-based approaches to detecting sentiment in a sentence rely on a lexicon where words and phrases are provided with sentiment labels as well as on techniques to recognize "polarity shifters", phrases causing the polarity of a lexical item to reverse. Early work on detection of polarity shifters used surface-level patterns (Yu and Hatzivassilouglu, 2003;Hu and Liu, 2004). Moilanen and Pulman (2007) provide a logic-oriented framework to compute the polarity of grammatical structures, that is capable of dealing with phenomena such as sentiment propagation, polarity reversal, and polarity conflict. Several papers looked at different ways to use syntactic dependency information in a machine learning framework, to better account for negations and their scope (Nakagawa et al., 2010;Socher et al., 2013).
To adapt a generic sentiment lexicon to a new application domain, previous work exploited semantic relations encoded in WordNet (Kim and Hovy, 2006), unannotated data (Li et al, 2012), or queries to a search engine (Taboada et al., 2006).

Our Approach
In the following sections, we will describe our approach to each stage of the Shared Task, reporting experiments on the provided training data using a 10-fold cross-validation.

Aspect Term Extraction
During pre-processing training data was parsed using a dependency parser (Bohnet and Nivre, 2012), and sentiment words were recognized in it using a sentiment lexicon (see Section 6.1). Candidate terms were extracted as single nouns, noun phrases, adjectives and verbs, enforcing certain exceptions as detailed in the annotation guidelines for the Shared Task (Pontiki et al., 2014), namely: • Sentiment words were not allowed as part of terms; • Noun phrases with all elements capitalized and acronyms were excluded, under the assumption they refer to brands rather than product aspects; • Nouns referring to the product class as a whole ("restaurant", "laptop", etc) were excluded.
Candidate terms that exactly overlapped with manually annotated terms were discarded, while those that did not were used as negative examples of aspect terms.
In order to provide the term extraction process with additional lexical knowledge, from the training data we extracted those manually annotated terms that corresponded to a single aspect category. Then the set of terms belonging to each category was augmented using WordNet: first we determined the 5 most prominent hyperonyms of these terms in the WordNet hierarchy using Resnik (1992)'s algorithm for learning a class in a semantic hierarchy that best represents selectional preferences of a verb, additionally requiring that each hypernym is at least 7 nodes away from the root, to make them sufficiently specific. Then we obtained all lexical items that belong to children synsets of these hypernyms, and further extended these lexical items with their meronyms and morphological derivatives. The resulting set of lexical items was later used as an extended aspect term lexicon. We additionally created a list of all individual lemmas of content words found in this lexicon.
For each term, we extracted the following features to be used for automatic classification: • Normalized form: the surface form of the term after normalization; • Term lemmas: lemmas of content words found in the term; • Lexicon term: if the term is in the lexicon; • Lexicon lemmas ratio: the ratio of lexicon lemmas in the term; • Unigram: 3 unigrams on either side of the term; • Bigrams: The two bigrams around the term; • Adj+term: If an adjective depends on the term 1 or related to it via a link verb ("be", "get", "become", etc); • Sentiment+term: If a sentiment word depends on the term or related via a link verb; • Be+term: If the term depends on a link verb; • Subject term: If the term is a subject; • Object term: If the term is an object.
We first look at how well the manually designed patterns extracted potential terms. We are primarily interested in recall at this stage, since after that potential terms are classified into terms and nonterms with an automatic classifier. The recall on the restaurants was 70.5, and on the laptops − 56.9. These are upper limits on recall for the overall task of aspect term recognition. Table 1 and Table 2 compare the performance of several learning algorithms on the restaurants and the laptops dataset, respectively 2 .  Table 2: Learning algorithms on the aspect term extraction task, laptops dataset.
On both datasets, linear SVMs performed best, and so they were used in the subsequent experiments on term recognition. To examine the quality of each feature used for term classification, we ran experiments where a classifier was built and tested without that feature, see Tables 3 and 4, for the restaurants and laptops datasets respectively, where a greater drop in performance compared to the entire feature set, indicates a more informative feature.
The results show the three most useful features are the same in both datasets: the occurrence of the candidate term in the constructed sentiment lexicon, the lemmas found in the term, and the normalized form of the term account.
We ran further experiments manually selecting several top-performing features, but none of the   configurations produced significant improvements on the use of the whole feature set. Table 5 shows the results of evaluation of the aspect term extraction on the test data of the Shared Task (baseline algorithms were provided by the organizers). The results correspond to what can be expected based on the upper limits on recall for the pattern-based extraction of candidate terms as well as precision and recall for the classifier.

Aspect Category Recognition
To recognize aspect categories in a sentence, we classified individual clauses found in it, assuming that each aspect category would be discussed in a separate clause. Features used for classification were lemmas of content words; to account for the fact that aspect terms are more indicative of aspect categories than other words, we additionally used entire terms as features, weighting them twice as much as other features. Table 6 compares the per-formance of several learning algorithms when automatically recognized aspect terms were not used as an additional feature; Table 7 shows results when terms were used as features.  Table 7: Learning algorithms on the aspect category recognition task, aspect terms weighted.
The addition of aspect terms as separate features increased F-scores for all the learning methods, sometimes by as much as 5%. Based on these results, we used the linear SVM method for the task submission.

Aspect Term Sentiment
To recognize sentiment in a sentence, we take a lexicon-based approach. The sentiment lexicon we used encodes the lemma, the part-of-speech tag, and the polarity of the sentiment word. It was built by combining three resources: lemmas from SentiWordNet (Baccianella et al., 2010), which do not belong to more than 3 synsets; the General Inquirer lexicon (Stone et al., 1966), and a subsection of the Roget thesaurus annotated for sentiment (Heng, 2004). In addition, we added sentiment expressions that are characteristic of the restaurants and laptop domains, obtained based on manual analysis of the restaurants corpus used in (Snyder and Barzilay (2007) and the laptop reviews corpus used in (Jindal and Liu, 2008).
To detect negated sentiment, we used a list of negating phrases such as "not", "never", etc., and two types of patterns to determine the scope of a negation. The first type detected negations on the sentence level, checking for negative phrases at the start of the sentence; negations detected on the sentence level were propagated to the clause level. The second type of patterns detected negated sentiment within a clause, using patterns specific to the part-of-speech of the sentiment word (e.g., "AUXV + negation + VB + MAINV", where MAINV is a sentiment verb). The output of this algorithm is the sentence split into clauses, with each clause being assigned one of four sentiment labels: "positive", "negative", "neutral", "conflict". Thus, each term was associated with the sentiment of the clause it appeared in.
On the test data of the Shared Task, the algorithm achieved the accuracy scores of 76.0 (the restaurants data, for the baseline of 64.3) and 63.6 (the laptops data, for the baseline of 51.1).

Category Sentiment
Recall that aspect categories were recognized in a sentence by classifying its individual clauses. Category sentiment was determined from the sentiment of the clauses where the category was found. In case more than one clause was assigned to the same category and at least one clause expressed positive sentiment and at least one − negative, such cases were classified as conflicted sentiment. This method achieved the accuracy of 72.8 (on the restaurants data), with the baseline being 65.65.

Conclusion
Our study has shown that aspect terms can be detected with a high accuracy using a domain lexicon derived from WordNet, and a set of classification features created with the help of deep linguistic processing techniques. However, the overall accuracy of aspect term recognition is greatly affected by the extraction patterns that are used to extract initial candidate terms. We also found that automatically extracted aspect terms are useful features in the aspect category recognition task. With regards to sentiment detection, our results suggest that reasonable performance can be achieved with a lexicon-based approach coupled with carefully designed rules for the detection of polarity shifts.