Abductive Inference for Interpretation of Metaphors

This paper presents a metaphor interpretation pipeline based on abductive inference. In this framework following (Hobbs, 1992) metaphor interpretation is modelled as a part of the general discourse processing problem, such that the overall discourse coherence is supported. We present an experimental evaluation of the proposed approach using linguistic data in English and Russian


Introduction
In this paper, we elaborate on a semantic processing framework based on a mode of inference called abduction, or inference to the best explanation. In logic, abduction is a kind of inference which arrives at an explanatory hypothesis given an observation. (Hobbs et al., 1993) describe how abduction can be applied to the discourse processing problem, viewing the process of interpreting sentences in discourse as the process of providing the best explanation of why the sentence would be true. (Hobbs et al., 1993) show that abductive reasoning as a discourse processing technique helps to solve many pragmatic problems such as reference resolution, the interpretation of noun compounds, detection of discourse relations, etc. as a by-product. (Hobbs, 1992) explains how abduction can be applied to interpretation of metaphors.
The term conceptual metaphor (CM) refers to the understanding of one concept or conceptual domain in terms of the properties of another (Lakoff and Johnson, 1980;Lakoff, 1987). For example, development can be understood as movement (e.g., the economy moves forward, the engine of the economy). In other words, a conceptual metaphor consists in mapping a target conceptual domain (e.g., economy) to a source domain (e.g., vehicle) by comparing their properties (e.g., an economy develops like a vehicle moves). In text, conceptual metaphors are represented by linguistic metaphors (LMs), i.e. natural language phrases expressing the implied comparison of two domains.
We present a metaphor interpretation approach based on abduction. We developed an end-toend metaphor interpretation system that takes text potentially containing linguistic metaphors as input, detects linguistic metaphors, maps them to conceptual metaphors, and interprets conceptual metaphors in terms of both logical predicates and natural language expressions. Currently, the system can process linguistic metaphors mapping predefined target and source domains.
We perform an experimental evaluation of the proposed approach using linguistic data in two languages: English and Russian. We select target concepts and generate potential sources for them as described at github.com/MetaphorExtractionTools/mokujin. For top-ranked sources, we automatically find corresponding linguistic metaphors. These linguistic metaphors are each then validated by three expert linguists. For the validated linguistic metaphors, we generate natural language interpretations, which are also validated by three experts.

Related Work
Automatic interpretation of linguistic metaphors is performed using two principal approaches: 1) deriving literal paraphrases for metaphorical expressions from corpora (Shutova, 2010;Shutova et al., 2012) and 2) reasoning with manually coded knowledge (Hobbs, 1992;Narayanan, 1999;Barnden and Lee, 2002;Agerri et al., 2007;Veale and Hao, 2008). (Shutova, 2010;Shutova et al., 2012) present methods for deriving paraphrases for linguistic metaphors from corpora. For example, the metaphorical expression "a carelessly leaked re-port" is paraphrased as "a carelessly disclosed report". This approach currently focuses on singleword metaphors expressed by verbs only and does not explain the target-source mapping.
The KARMA (Narayanan, 1999) and the ATT-Meta (Barnden and Lee, 2002;Agerri et al., 2007) systems perform reasoning with manually coded world knowledge and operate mainly in the source domain. The ATT-Meta system takes logical expressions that are representations of a small discourse fragment as input; i.e., it does not work with natural language. KARMA focuses on dynamics and motion in space. For example, the metaphorical expression the government is stumbling in its efforts is interpreted in terms of motion in space: stumbling leads to falling, while falling is a conventional metaphor for failing. (Veale and Hao, 2008) suggest to derive common-sense knowledge from WordNet and corpora in order to obtain concept properties that can be used for metaphor interpretation. Simple inference operations, i.e. insertions, deletions and substitution, allow the system to establish links between target and source concepts. (Hobbs, 1992) understands metaphor interpretation as a part of the general discourse processing problem. According to Hobbs, a metaphorical expression should be interpreted in context. For example, John is an elephant can be best interpreted as "John is clumsy" in the context Mary is graceful, but John is an elephant. In order to obtain context-dependent interpretations, (Hobbs, 1992) uses abductive inference linking parts of the discourse and ensuring discourse coherence.

Metaphor Interpretation System
Our abduction-based metaphor interpretation system is shown in Fig. 1. Text fragments possibly containing linguistic metaphors are given as input to the pipeline. The text fragments are parsed and converted into logical forms (section 3.1). The logical forms are input to the abductive reasoner (section 3.2) that is informed by a knowledge base (section 4). The processing component labelled "CM extractor & scorer" extracts conceptual metaphors from the logical abductive interpretations and outputs scored CMs and Target-Source mappings (section 3.3). The Target-Source mappings are then translated into natural language expressions by the NL generator module (section 3.4).

Logical Form Generation
A logical form (LF) is a conjunction of propositions which have argument links showing relationships among phrase constituents. We use logical representations of natural language texts as described in (Hobbs, 1985). In order to obtain LFs we convert dependency parses into logical representations in two steps: 1) assign arguments to each lemma, 2) apply rules to dependencies in order to link arguments.
LFs are preferable to dependency structures in this case because they generalize over syntax and link arguments using long-distance dependencies. Furthermore, we need logical representations in order to apply abductive inference.
In order to produce logical forms for English, we use the Boxer semantic parser (Bos et al., 2004). As one of the possible formats, Boxer outputs logical forms of sentences in the style of (Hobbs, 1985). For Russian, we use the Malt dependency parser (Nivre et al., 2006). We developed a converter turning Malt dependencies into logical forms in the style of (Hobbs, 1985). 1

Abductive Inference
In order to detect conceptual metaphors and infer explicit mappings between target and source domains, we employ a mode of inference called weighted abduction (Hobbs et al., 1993). This framework is appealing because it is a realization of the observation that we understand new material by linking it with what we already know.
Abduction is inference to the best explanation. Formally, logical abduction is defined as follows:  Typically, there exist several hypotheses H explaining O. To rank hypotheses according to plausibility and select the best hypothesis, we use the framework of weighted abduction (Hobbs et al., 1993). Frequently, the best interpretation results from identifying two entities with each other, so that their common properties only need to be proved or assumed once. Weighted abduction favors those interpretations that link parts of observations together and supports discourse coherence, which is crucial for discourse interpretation.
According to (Hobbs, 1985), metaphor interpretation can be modelled as abductive inference revealing conceptual overlap between the target and the source domain. Consider the abductive interpretation produced for the sentence We intend to cure poverty, Fig. 2. In the top line of the figure, we have the LF (cf. Sec. 3.1), where we can see that a person (x 1 ) is the agent for the verbs intend (e 1 ) and cure (e 2 ) and that poverty (x 2 ) is the object of cure. In the first box in the next row, we see that cure invokes the source concepts of DIS-EASE, CURE, and DOCTOR, where DISEASE is the object of CURE, and DOCTOR is the subject. In the same row, we see that poverty invokes the POVERTY concept in the target domain. Importantly, POVERTY and DISEASE share the same argument (x 2 ), which refers to poverty.
The next row contains two boxes with ellipses, representing long chains of common-sense inferences in the source and target domains of DIS-EASE and POVERTY, respectively. For DIS-EASE we know that linguistic tokens such as illness, sick, disease, etc. cause the afflicted to experience loss of health, loss of energy, and a general lack of productivity. For POVERTY, we know that tokens such as poor, broke, poverty mean that the experiencer of poverty lacks money to buy things, take care of basic needs, or have access to trans-portation. The end result of both of these frameworks is that the affected individuals (or communities) cannot function at a normal level, with respect to unaffected peers. We can use this common meaning of causing the individual to not function to link the target to the source.
The next three rows provide the mapping from the meaning of the source (CURE, DOC-TOR, DISEASE) concepts to the target concept (POVERTY). As explained above, we can consider DISEASE as a CAUSING-AGENT that can CAUSE NOT FUNCTION; POVERTY can be explained the same way, at a certain level of abstraction. Essentially, the interpretation of poverty in this sentence is that it causes some entity not to function, which is what a DISEASE does as well. For CURE, we see that cure can CAUSE NOT EX-IST, while looking for a CAUSING-AGENT (person) and an EXISTING DISEASE (poverty).
In our system, we use the implementation of weighted abduction based on Integer Linear Programming (ILP) (Inoue and Inui, 2012), which makes the inference scalable.

CM Extractor and Scorer
The abductive reasoning system produces an interpretation that contains mappings of lexical items into Target and Source domains. Any Target-Source pair detected in a text fragment constitutes a potential CM. For some text fragments, the system identifies multiple CMs. We score Target-Source pairs according to the length of the dependency path linking them in the predicate-argument structure. Consider the following text fragment: opponents argue that any state attempting to force an out-of-state business to do its dirty work of tax collection violates another state's right to regulate its own corporate residents and their commerce Suppose our target domain is TAXATION, triggered by tax collection in the sentence above. In our corpus, we find realizations of the CM TAXA-TION is an ENEMY (fight against taxes). The lexeme opponent triggers the STRUGGLE/ENEMY domain. However, the sentence does not trigger the CM TAXATION is an ENEMY. Instead, it instantiates the CM TAXATION is DIRT (dirty work of tax collection). The length of the dependency path between dirty and tax is equal to 2, whereas the path between opponent and tax is equal to 9. Therefore, our procedure ranks TAXATION is DIRT higher, which corresponds to the intuition that target and source words should constitute a syntactic phrase in order to trigger a CM.

NL Representation of Metaphor Interpretation
The output of the abduction engine is similar to the logical forms provided in Fig. 2. In order to make the output more reader friendly, we produce a natural language representation of the metaphor interpretation using templates for each CM. For example, the text their rivers of money mean they can offer far more than a single vote would invoke the WEALTH is WATER CM, and the abduction engine would output: LARGE-AMOUNT[river], THING-LARGE-AMOUNT [money]. We then take this information and use it as input for the NL generation module to produce: "river" implies that there is a large amount of "money".

Knowledge Base
In order to process metaphors with abduction, we need a knowledge base that encodes the informa-tion about the source domain, the target domain, and the relationships between sources and targets. We develop two distinct sets of axioms: lexical axioms that encode lexical items triggering domains, and mapping axioms that encode knowledge used to link source and target domains. We will discuss the details of each axiom type next.

Lexical Axioms
Every content word or phrase that can be expected to trigger a source or target domain is included as a lexical axiom in the knowledge base. For example, the STRUGGLE domain contains words like war, fight, combat, conquer, weapon, etc. An example of how a lexical axiom encodes the system logic is given in (1). On the left side, we have the linguistic token, fight, along with its part-of-speech, vb, and the argument structure for verbs where e 0 is the eventuality (see (Hobbs, 1985)) of the action of fighting, x is the subject of the verb, and y is the object. On the right side, STRUGGLE is linked to the action of fighting, the subject is marked as the AGENT, and the object is marked as the ENEMY. ( The lexicon is not limited to single-token entries; phrases can be included as single entries; For example, the ABYSS domain has phrases such as climb out of as a single entry. Encoding phrases often proves useful, as function words can often help to distinguish one domain from others. In this case, climbing out of something usually denotes an abyss, whereas climbing up or on usually does not. The lexical axioms also include the POS for each word. Thus a word like fight can be entered as both a noun and a verb. In cases where a single lexical axiom could be applied to multiple domains, one can create multiple entries for the axiom with different domains and assign weights so that a certain domain is preferred over others. Initial lexical axioms for each domain were developed based on intuition about each domain. We then utilize ConceptNet (Havasi et al., 2007) as a source for semi-automatically extracting a large-scale lexicon. ConceptNet is a multilingual semantic network that establishes links between words and phrases. We query ConceptNet for our initial lexical axioms to return a list of related words and expressions.

Mapping Axioms
Mapping axioms provide the underlying meanings for metaphors and link source and target domains. All of these axioms are written by hand based on common-sense world knowledge about each target-source pair. For each CM, we consider a set of LMs that are realizations of this CM in an effort to capture inferences that are common for all of the LMs. We consider the linguistic contexts of the LMs and overlapping properties of the target and source domains derived from corpora as described in section 5.1.
We will outline the process of axiomatizing the STRUGGLE domain here. We know that a verb like fight includes concepts for the struggle itself, an agent, and an enemy. In the context of a STRUGGLE, an enemy can be viewed as some entity a that attempts to, or actually does, inhibit the functioning of some entity b, often through actual physical means, but also psychologically, economically, etc. The struggle, or fight, itself then, is an attempt by a to rid itself of b so that a can ensure normal functionality. So, given a phrase like poverty is our enemy, the intended meaning is that poverty is hindering the functionality of some entity (an individual, a community, a country, etc.) and is seen as a problem that must be fought, i.e. eliminated. In a phrase like the war against poverty, war refers to an effort to stop the existence of poverty. These inferences are supported by the overlapping property propositions extracted from English Gigaword as described in Sec. 5.1, e.g., scourge of X, country fights X, country pulls of X, suffer from X, fight against X.
Here, we encode a STRUGGLE action, e.g. fight, as CAUSE NOT EXIST, the AGENT of the fight as CAUSING-AGENT, and the ENEMY as EXISTING-THING. Then, for a verb phrase like we fight poverty, we is the AGENT that engages in causing poverty, the ENEMY, to not exist. ( We use 75 mapping axioms to cover the valid LMs discussed in Sec. 5.2. Some interesting trends emerge when examining the core meanings of the LMs. Following (Hobbs, 2005), we found that over 65% of the valid LMs in this study could be explained in terms of causality. The next most prevalent aspect that these metaphors touch upon is that of functionality (nearly 35%), with some of these overlapping with the causality aspect where the meaning has to do with X causing Y to function or not function.
Many of the CMs covered in this study have fairly transparent interpretations based on these ideas of causality and functionality, such as POVERTY is DISEASE, where the main underlying meaning is that a disease causes the sufferer not to function properly. However, for some CMs, the interpretation can be more difficult to pin down. For example, the interpretation of WEALTH is a GAME is quite opaque. Given a sentence such as, Wealth is a game and you better start playing the game, there are no obvious connections to concepts such as causality or functionality. Instead, game raises such ideas as competition, winning, and losing. In the literal context of a game, the competition itself, who the competitors are, and what it means to win or lose are usually clearly defined, but this is not so when speaking metaphorically about wealth. To derive a meaning of game that can apply to wealth, we must look at a higher level of abstraction and define game as the instantiation of a positive or negative outcome, i.e. to win is to achieve a positive outcome, or gain wealth. In the same sentence play implies that some voluntary action must be taken to achieve a positive outcome.
For some metaphors, a simple transfer of the source properties to the target does not result in a coherent interpretation at all. Given, for example, the CM POVERTY is a PRICE, one LM from this study is, poverty is the price of peace. In this case, the meaning has to do with some notion of an exchange, where a negative consequence must be accepted in order to achieve a desired outcome. However, the metaphorical meaning of price differs from the literal meaning of the word. In literal contexts, price refers to an amount of money or goods with inherent value that must be given to acquire something; the buyer has a supply of money or goods that they willingly exchange for their desired item. In the metaphorical sense, though, there often is no buyer, and there is certainly not an inherent value that can be assigned to poverty, nor can one use a supply of it to acquire peace.
Another issue concerns cultural differences. While writing the axioms to deal with English and Russian source-target pairs we noticed that a majority of the axioms applied equally well to both languages. However, there are some subtle differences of aspect that impact the interpretation of similar CMs across the two languages. Looking again at the WEALTH is a GAME metaphor, the Russian interpretation involves some nuance of a lack of importance about the subject that does not seem to be present in English when using words like game and play. Note that there may be some notion of carelessness for English (see Sec. 5.3), but for Russian, the notion of being carefree, which is not the same as careless, about wealth has a strong prevalence.

Source Generation
Following from the definition of metaphor, the target and the source domain share certain properties. In natural language, concepts and properties are represented by words and phrases. There is a long-standing tradition for considering computational models derived from word co-occurrence statistics as being capable of producing reasonable property-based descriptions of concepts (Baroni and Lenci, 2008). We use proposition stores to derive salient properties of concepts that can be potentially compared in a metaphor.
A proposition store is a collection of propositions such that each proposition is assigned its frequency in a corpus. Propositions are tuples of words that have a determined pattern of syntactic relations among them (Clark and Harrison, 2009;Peñas and Hovy, 2010;Tsao and Wible, 2013). For example, the following propositions can be extracted from the sentence John decided to go to school: (NV John decide) (NV John go) (NVPN John go to school) ... We generated proposition stores from parsed English Gigaword (Parker et al., 2011) and Russian ruWac (Sharoff and Nivre, 2011). Given the proposition stores, we generate potential sources for a seed target lexeme l in three steps: 1. Find all propositions P l containing l.
2. Find all potential source lexemes S such that for each s ∈ S there are propositions p, p in the proposition store such that l occurs at position i in p and s occurs at position i in p . The set of propositions containing l and s at the same positions is denoted by P l,s .
3. Weight potential sources s ∈ S using the following equation: The source generation procedure and its validations are described in detail at github.com/MetaphorExtractionTools/mokujin. 2 In the experiment described below, we generated potential sources for the target domains of POVERTY and WEALTH.

Linguistic Metaphors Extraction and Validation
For each potential CM, we look for supporting LMs in corpora. A a large number of LMs supporting a particular CM suggests that this CM might be cognitively plausible. We use a simple method for finding LMs. If a target lexeme and a source lexeme are connected by a dependency relation in a sentence, then we assume that this dependency structure contains a LM. For example, in the phrases medicine against poverty and chronic poverty, the target word (poverty) is related via dependency arc with the source words (medicine, chronic). LMs were extracted from English Gigaword (Parker et al., 2011) and Russian ruWac (Sharoff and Nivre, 2011).
For the generated CMs, we select seed lexemes for target and source domains. We expand the sets of these target and source lexemes with semantically related lexemes using English and Russian ConceptNet (Speer and Havasi, 2013) and top ranked patterns from the proposition stores. For example, the expansion of the lexeme disease results in the following set of lexemes : {disease, symptom, syndrome, illness, unwellness, sickness, sick, medicine, treatment, treat, cure, doctor, ... } For each language, we select 20 top-ranked sources per target. Then we randomly select at most 10 sentences per each target-source pair. These sentences are validated by 3 linguist experts each. For each sentence, the experts are asked if it contains a metaphor comparing an indicated target domain with an indicated source domain. The inter-annotator agreement on the validation task is defined as the percentage of judgements on which the three experts agree. Agreement is 81% for English and 80% for Russian.
Tables 1 and 2 show 10 potential sources per target with the best agreement. Column ALL provides the number of sentences per a proposed CM such that all experts agreed that the sentence contains a metaphor. Column TWO provides the number of sentences such that any two experts agreed on, and Column ONE shows the number of sentences such that a single expert thought it contained a metaphor. target source ALL TWO ONE wealth blood 10 10 10 water 9 10 10 drug 9 10 10 food 9 9 10 body 9 9 10 power 8 9 10 game 8 9 9 security 7 9 10 resource 7 7 9 disease 7 8 9 poverty war 10 10 10 abyss 10 10 10 violence 9 9 10 price 8 9 9 location 7 8 8 disease 7 7 7 crime 4 5 6 crop 3 7 9 terrorism 3 3 5 cost 2 3 7 Table 1: Validation of English linguistic metaphors found for potential sources.
in Sec. 3.4. Each interpretation was validated by three expert linguists. We calculated strict and relaxed agreement for the validated data. Strict agreement is calculated over three categories: correct (C), partially correct (P), and incorrect (I). Relaxed agreement is calculated over two categories: C/P and I. Partially correct means that the validator felt that something was missing from the interpretation, but that what was there was not wrong. Table 3 presents the validation results for both languages. As can be seen in the table, strict agreement (AgrS) is 62% and 52% and strict system accuracy (AccS ALL) is 62% and 50% for English and Russian, respectively. Relaxed agreement (AgrR) results is 93% and 83%, and relaxed accuracy (AccR ALL) is 91% and 78%.
Validators often marked things as only partially correct if they felt that the interpretation was lacking some aspect that was critical to the meaning of the metaphor. A common feeling amongst the validators, for example, is that the interpretation for people who are terrorized by poverty should include some mention of "fear" as a crucial aspect of the metaphor, as the interpretation provided states only that "terrorize" implies that "poverty" is causing "people" not to function. However, the end result of "fear" itself is often that the experiencer cannot function, as in paralyzed by fear.
Tables 4 and 5 contain interpretation system accuracy results by CM. We calculated the percentage of LMs evoking this CM that were validated as C vs. I (strict) or P/C vs. I (relaxed) by all three  (ALL), or just two (TWO) validators. In most of the cases, the system performs well on "simple" CMs related to the concepts of causation and functioning (e.g., WEALTH is POWER), cf. section 4, whereas its accuracy is lower for richer metaphors (e.g., WEALTH is a GAME).  The data used in the described experiments, system output, and expert validations are available at http://ovchinnikova.me/suppl/AbductionSystem-Metaphor-Validation.7z.

Conclusion and Future Work
The developed abduction-based metaphor interpretation pipeline is available at https://github.com/eovchinn/Metaphor-ADP as a free open-source project. This pipeline produces favorable results, with metaphor interpretations that are rated as at least partially correct, for over 90% of all valid metaphors it is given for English, and close to 80% for Russian. Granted, the current research is performed using a small, controlled set of metaphors, so these results could prove difficult to reproduce on a large scale where any metaphor is possible. Still, the high accuracies achieved on both languages indicate  that the approach is sound and there is potential for future work. The current axiomatization methodology is based mainly on manually writing mapping axioms based on the axiom author's intuition. Obviously, this approach is subject to scrutiny regarding the appropriateness of the metaphors and faces scalability issues. Thus, developing new automatic methods to construct the domain knowledge bases is a main area for future consideration.
The mapping axioms present a significant challenge as far producing reliable output automatically. One area for consideration is the aforementioned prevalence of certain underlying meanings such as causality and functionality. Gathering enough examples of these by hand could lead to generalizations in argument structure that could then be applied to metaphorical phrases in corpora to extract new metaphors with similar meanings. Crowd-sourcing is another option that could be applied to both axiom writing tasks in order to develop a large-scale knowledge base in considerably less time and at a lower cost than having experts build the knowledge base manually.