Multi-dimensional abstractness in cross-domain mappings

Metaphor is a cognitive process that shapes abstract target concepts by mapping them to concrete source concepts. Thus, many computational approaches to metaphor make reference, directly or indirectly, to the abstractness of words and concepts. The property of abstractness, however, remains theoretically and empirically unexplored. This paper implements a multi-dimensional deﬁnition of abstract-ness and tests the usefulness of each di-mension for detecting cross-domain mappings.


Introduction
The idea of metaphor as cross-domain mapping goes back, at least, to Black (1954), who made explicit an earlier implicit view that linguistic metaphors depend upon non-linguistic (i.e., conceptual) connections between networks of concepts. Black's premises were later employed to represent groups of related linguistic metaphoric expressions using non-linguistic conceptual metaphors (for example, Reddy, 1979, andLakoff &Johnson, 1980). Inherent in this approach to representing metaphor is the idea that metaphor is, at its core, a matter of crossdomain mapping (e.g., Lakoff, 1993); in other words, metaphor is a cognitive process that builds or maps connections between networks of concepts. The study of cognitive metaphor processes has largely focused on content-specific representations of such mappings within a number of content domains, such as TIME and IDEAS. Thus, a crossdomain mapping may be represented as something like ARGUMENT IS WAR. Computational approaches to metaphor, however, have represented cross-domain mappings using higher-level properties like abstractness (Gandy, et al., 2013;Assaf, et al., 2013;Tsvetkov, et al., 2013;Turney, et al., 2011), semantic similarity (Li & Sporleder, 2010;Sporleder & Li, 2010), domain membership (Dunn, 2013a(Dunn, , 2013b, word clusters that represent semantic similarity Shutova & Sun, 2013), and selectional preferences (Wilks, 1978;Mason, 2004). Most of these approaches rely on some concept of abstractness, whether directly (e.g., in terms of abstractness ratings) or indirectly (e.g., in terms of clusters containing abstract words). Further, these approaches have viewed abstractness as a one-dimensional scale between abstract and concrete concepts, with metaphor creating mappings from concrete source concepts to abstract target concepts.
Although both theoretical and computational treatments of metaphor depend upon the concept of abstractness, little has been done to either define or operationalize the notion. To fill this gap, this paper puts forward a multi-dimensional definition of abstractness and implements it in order to test the usefulness of the dimensions of abstractness for detecting cross-domain mappings.

Multi-dimensional abstractness
This approach recognizes four dimensions of abstractness: Domain of the Referent, Domain of the Sense, Fact-Status, and Function-Status, each of which has a range of values from more abstract to less abstract, as shown in Table 1. Domain refers to top-level categories in a hierarchical ontology as in, for example, ontological semantics (Nirenburg & Raskin, 2004), which uses four top-level domains: PHYSICAL, MENTAL, SO-CIAL, ABSTRACT. Each concept belongs within a certain domain so that, at the highest level, crossdomain mappings can be represented as mappings between, for example, a PHYSICAL concept and an ABSTRACT concept. This dimension corresponds most with the traditional one-dimensional approach to abstractness.
Here we divide domain membership into two types: (i) Domain of the Sense and (ii) Domain of the Referent. The idea is that a concept may refer to an object in one domain but define properties of that concept relative to another domain. For example, the concept teacher refers to a PHYS-ICAL object, a human who has physical properties. At the same time, the concept teacher is defined or distinguished from other humans in terms of SOCIAL properties, such as being focused on the education of students. Thus, the referent of the concept is within the PHYSICAL domain but its sense is within the SOCIAL domain. This is also true, for example, of countries (e.g., Mexico) which refer to a PHYSICAL location but also to a SOCIAL entity, the government and people who reside in that physical location. It is important to distinguish sense and reference when searching for cross-domain mappings because many concepts inherently map between different domains in this way (and yet are not considered metaphoric). Within both types of Domain, ABSTRACT is the category with the highest abstractness and PHYSI-CAL with the least abstractness.
Fact-Status is an ontological property as opposed to a domain within a hierarchical ontology. It represents the metaphysical property of a concept's dependence on human consciousness (Searle, 1995). In other words, PHYSICAL-FACTS are those, like rocks and trees, which exist in the external world independent of human perceptions. NON-INTENTIONAL facts are involuntary human perceptions such as pain or fear. INTENTIONAL facts are voluntary products of individual human consciousness such as ideas and opinions. COL-LECTIVE facts are products of the consciousness of groups of humans, such as laws and governments. Thus, all categories except for PHYSICAL-FACTS are dependent on human consciousness. NON-INTENTIONAL and INTENTIONAL facts depend only on individuals, and in this sense are less abstract than COLLECTIVE facts, which exist only if a group of humans agrees to recognize their existence. This dimension of abstractness measures how dependent on human consciousness and how socially-constructed a concept is, with COLLEC-TIVE facts being more socially-constructed (and thus more society-dependent) than the others.
The final dimension of abstractness is Function-Status, which reflects how embedded function in- formation is in the sense of a concept. Function information is human-dependent, being present only as assumed by humans; thus, this dimension is also related to how human-centric a particular concept is. Many concepts have no function information embedded in them, for example rock or tree, and these are the least human-dependent. Some concepts have NON-AGENTIVE functions, sometimes called NATURAL functions; for example, the function of a heart is to pump blood. Some concepts have PHYSICAL-USE functions, in which the embedded function is a reflection of how humans use a physical object; for example, the function of a hammer is to drive nails. Finally, many concepts have embedded within them INSTITU-TIONAL functions, those which perform a social function only insofar as a group of individuals agree that the social function is performed. For example, a group of individuals may declare that certain taxes will be collected on income; but if others do not consent to the performance of that function then it is not performed (e.g., if the group had no legal authority to do so). Thus, INSTITU-TIONAL functions have the highest abstractness.
In addition to these dimensions of abstractness, two properties are added in order to test how they interact with these dimensions of abstractness: Event-Status, distinguishing OBJECTS from STATES and PROCESSES, and Animacy, distinguishing HUMANS from ANIMATE non-humans and INANIMATE objects.

Implementation
The system has two main steps: first, the input text is mapped to concepts in the Suggested Upper Merged Ontology (Niles & Pease, 2001); second, features based on the ontological properties of these concepts are used to represent the input sentences as a feature vector. The text is processed using Apache OpenNLP for tokenization, named entity recognition, and part of speech tagging. Morpha (Minnen, et al., 2001) is used for lemmatization. At this point word sense disambiguation is performed using SenseRelate (Pedersen & Kolhatkar, 2009), mapping the lexical words to the corresponding WordNet senses. These WordNet senses are first mapped to SynSets and then to concepts in the SUMO ontology, using existing mappings (Niles & Pease, 2003). Thus, the input to the second part of the system is the set of SUMO concepts which are pointed to by the input text. The properties of these concepts are contained in a separate knowledge-base developed for this system and available from the author. Each concept in SUMO has a value for each of the concept properties. This value is fixed and is the same across all instances of that concept. Thus, SenseRelate disambiguates input text into WordNet synsets which are mapped onto SUMO concepts, at which point the mapping from concepts to concept properties is fixed.  The concept properties discussed above are used to create a total of 41 features as shown in Table 2: First, 23 features contain the total number of instances of each possible value for the properties in each sentence relative to the number of concepts present. Second, 6 features contain the relative frequency of the most common values of a property (the "main" value) and 6 features the relative frequency of all the other values (the "other" value). Third, 6 features contain the number of types of property values present in a sentence relative to the number of possible types.

Evaluation of the Features
We evaluated these features in a binary classification task using the VU Amsterdam Metaphor Corpus (Steen, et al., 2010), which consists of 200,000 words from the British National Corpus divided into four genres (academic, news, fiction, and spoken; the spoken genre was not evaluated) and annotated by five linguists. Metaphorically used prepositions have been untagged, as have ambiguously metaphoric sentences. Non-sentence fragments have been removed (e.g., titles and bylines), along with very short sentences (e.g., "He said.").
The first step was to evaluate the features individually for their usefulness in detecting metaphoric language, allowing us to ask theoretical questions about which dimensions of abstractness are most related to metaphor. The Classifier-SubSetEval feature in Weka (Hall, et al., 2009) was used with the logistic regression classifier on the full corpus with 10 fold cross-validation. Three different search algorithms were used to ensure that the best possible combination of variables was found: the Greedy Stepwise, Linear Forward Selection, and Rank Search algorithms. The final feature rating was computed by taking the reverse ranking given by the GreedyStepwise search (e.g., the top ranked feature out of 41 is given a 41) and adding the number of folds for which that feature was selected by the other two algorithms. Table 3 below shows the top variables, arranged by score.
An interesting finding from this selection process is that each of the concept properties made the list of the top 16 features in the form of the Property: Other feature. In other words, the number of minority values for each property is useful for detecting cross-domain mappings. Next, each of the values for the Function property was a top feature, while only two of the Domain-Sense and one of the Domain-Referent properties made the list. The properties of Animacy and Fact are represented by the number of types present in the utterance, and  Fact is also significant for the number of concepts with the Collective value. These are interesting, and unexpected, findings, because the most important properties for detecting metaphor are not the traditional Domain-defined notions of abstractness, either Sense or Referent, but rather those notions of abstractness which are tied to a concept's degree of dependence on human consciousness and degree of being socially-constructed.
Using these top 16 variables, a binary classification task was performed on the entire VU Amsterdam Corpus, prepared as described above, using the logistic regression algorithm with 10 fold cross-validation, giving the results shown below in Table 4. These results show that while the full set of 41 features performs slightly better than the select set of the top 16, the performance gain is fairly small. For example, the F-measure on the full corpus raises from 0.629, using only the top 16 variables, to 0.649 using the full set of 41 variables. Thus, a similar performance is achieved much more efficiently (at least, in terms of the evaluation of the feature vectors; the top 16 variables still require many of the other variables in order to be computed). More importantly, this shows that the different dimensions of abstractness can be used to detect cross-domain mappings, licensing the inference that each of these operationalizations of abstractness represents an important and independent property of cross-domain mappings.

Relation between the dimensions of abstractness
In order to determine the relationship between these dimensions of abstractness, to be sure that they are not measuring only a single scale, principal components analysis was used to determine how many distinct groups are formed by the properties and their values. The written subset of the American and Canadian portions of the International Corpus of English, consisting of 44,189 sentences, was used for this task. The corpus was not annotated for metaphor; rather, the purpose is to find the relation between the features across both metaphor and non-metaphor, using the direct oblimin rotation method.  This procedure identified 10 components with eigenvalues above 1 containing unique highest value features, accounting for a cumulative 83.2% of the variance. These components are shown in Table 5  These components show two important results: First, the division of the Domain property into Sense and Referent is not necessary because the two are always contained in the same components; in other words, these really constitute a single-dimension of abstractness. Second, Domain, Function, and Fact-Status are not contained in the same components, but rather remain distinguishable dimensions of abstractness.
The important point of this analysis of the relations between features is that, even for those systems which do not represent abstractness in this way (e.g., systems which use numeric scales instead of nominal attributes), the dimensions of abstractness used here do represent independent factors. In other words, there is more than one dimension of abstractness. Domain membership, which corresponds most closely to the traditional onedimensional view, refers essentially to how concrete or physical a concept is. Thus, love is more abstract than grass, but no distinction is possible between love and war. Fact-Status refers to how dependent on human consciousness a concept is. PHYSICAL concepts do not depend upon humans in order to exist. Thus, PHYSICAL concepts will be represented with the same degree of abstractness by both the Domain and Fact-Status properties. However, Fact-Status adds distinctions between abstract concepts. For example, ideas are not physical, but laws are both non-physical and depend upon complex social agreements. Function-Status refers to how much of the definition of a concept is dependent upon Function information which is, ultimately, only present in human understandings of the concept. This dimension adds distinctions between even physical concepts. For example, canes are just as physical as sticks, but cane embeds function information, that the object is used to help a human to walk, and this function information is dependent upon human consciousness. These two additional and distinguishable dimensions of abstractness, then, operationalize how dependent a concept is on human consciousness and how socially-constructed it is.
Using the traditional one-dimensional approach to abstractness, not all metaphors have abstract target concepts. For example, in the metaphoric expressions "My car drinks gasoline" and "My surgeon is a butcher," the concepts CAR and SUR-GEON are both PHYSICAL concepts in terms of Domain, and thus not abstract. And yet these concepts are the targets of metaphors. However, the concept DRINKING, according to this system, has an INTENTIONAL Fact-Status, because it is an action which is performed purposefully, and thus is an action which only sentient beings can perform. It is more abstract, then, than a verb like uses, which would not be metaphoric. The second example, however, cannot be explained in this way, as both SURGEON and BUTCHER would have the same concept properties (they are not included in the knowledge-base; both map to HUMAN). This phrase occurs only twice in the 450+ million word Corpus of Contemporary American English, however, and represents a rare exception to the rule.

Conclusions
This paper has examined the notion of abstractness, an essential component of many theoretical and computational approaches to the crossdomain mappings which create metaphoric language. There are two important findings: First, of the four posited dimensions of abstractness, three were shown to be both (1) members of separate components and (2) useful for detecting metaphoric mappings. These three dimensions, Domain Membership, Fact-Status, and Function-Status, are different and distinguishable ways of defining and operationalizing the key notion of abstractness. Second, and perhaps more importantly, the Fact-Status and Function-Status dimensions of abstractness, which are not directly present in the traditional one-dimensional view of abstractness, were shown to be the most useful for detecting metaphoric mappings.
Although more evidence is needed, this suggests that cross-domain mappings are mappings from less socially-constructed source concepts to more socially-constructed target concepts and from less consciousness-dependent source concepts to more consciousness-dependent target concepts. This multi-dimensional approach thus provides a more precise definition of abstractness.