Recognizing Causality in Verb-Noun Pairs via Noun and Verb Semantics

Several supervised approaches have been proposed for causality identiﬁcation by relying on shallow linguistic features. However, such features do not lead to improved performance. Therefore, novel sources of knowledge are required to achieve progress on this problem. In this paper, we propose a model for the recognition of causality in verb-noun pairs by employing additional types of knowledge along with linguistic features. In particular, we focus on identifying and employing semantic classes of nouns and verbs with high tendency to encode cause or non-cause relations. Our model incorporates the information about these classes to minimize errors in predictions made by a basic supervised classiﬁer relying merely on shallow linguistic features. As compared with this basic classiﬁer our model achieves 14 . 74% ( 29 . 57% ) improvement in F-score (accuracy), respectively.


Introduction
The automatic detection of causal relations is important for various natural language processing applications such as question answering, text summarization, text understanding and event prediction. Causality can be expressed using various natural language constructions (Girju and Moldovan, 2002;Chang and Choi, 2006). Consider the following examples where causal relations are encoded using (1) a verb-verb pair, (2) a noun-noun pair and (3) a verb-noun pair. 1. Five shoppers were killed when a car blew up at an outdoor market. 2. The attack on Kirkuk's police intelligence complex sees further deaths after violence spilled over a nearby shopping mall.
3. At least 1,833 people died in hurricane.
Since, the task of automatic recognition of causality is quite challenging, researchers have addressed this problem by considering specific constructions.
For example, various models have been proposed to identify causation between verbs (Bethard and Martin, 2008;Beamer and Girju, 2009;Riaz and Girju, 2010;Do et al., 2011;Riaz and Girju, 2013) and between nouns (Girju and Moldovan, 2002;Girju, 2003). Do et al. (2011) have worked with verb-noun pairs for causality detection but they focused only on a small list of predefined nouns representing events.
In this paper, we focus on the task of identifying causality encoded by verb-noun pairs (example 3). We propose a novel model which first predicts cause or non-cause relations using a supervised classifier and then incorporates additional types of knowledge to reduce errors in predictions. Using a supervised classifier, our model identifies causation by employing shallow linguistic features (e.g., lemmas of verb and noun, words between verb and noun). Such features have been used successfully for various NLP tasks (e.g., partof-speech tagging, named entity recognition, etc.) but confinement to such features does not help much to achieve performance for identifying causation (Riaz and Girju, 2013). Therefore, in our model we plug in additional types of knowledge to obtain better predictions for the current task. For example, we identify the semantic classes of nouns and verbs with high tendency to encode cause or non-causal relations and use this knowledge to achieve better performance. Specifically, the contributions of this paper are as follows: • In order to build a supervised classifier, we use the annotations of FrameNet to generate a training corpus of verb-noun instances encoding cause and non-cause relations. We propose a set of linguistic features to learn and identify causal relations.
• In order to make intelligent predictions, it is important for our model to have knowledge about the semantic classes of nouns with high tendency to encode causal or non-causal relations. For example, a named entity such as person, organization or location may have high tendency to encode non-causality unless a metonymic reading is associated with it. In our approach, we identify such semantic classes of nouns by exploiting a named entity recognizer, the annotations of frame elements provided in FrameNet and WordNet. • Verbs are the important components of language for expressing events of various types. For example, Pustejovsky et al. (2003) have classified events into eight semantic classes: OCCURRENCE, PERCEPTION, RE-PORTING, ASPECTUAL, STATE, I STATE, I ACTION and MODAL. We argue that there are some semantic classes in this list with high tendency to encode cause or non-cause relations. For example, reporting events represented by verbs say, tell, etc., have high tendency to just report other events instead of encoding causality with them. In our model, we use such information to reduce errors in predictions.
• Each causal relation is characterized by two roles i.e., cause and its effect. In example 3 above, the noun "hurricane" is cause and the verb "died" is its effect. However, a verb-noun pair may not encode causality when a verb and a noun represent same event. For example, in instance "Colin Powell presented further evidence in his presentation.", the verb "presented" and the noun "presentation" represent same event of "presenting" and thus encoding non-cause relation with each other. In our model, we determine the verb-noun pairs representing same or distinct events to make predictions accordingly. • We adopt the framework of Integer Linear Programming (ILP) (Roth and Yih, 2004;Do et al., 2011) to combine all the above types of knowledge for the current task.
This paper is organized as follows. In next section, we briefly review the previous research done for identifying causality. We introduce our model and evaluation with discussion on results in section 3 and 4, respectively. The section 5 of the paper concludes our current research.

Related Work
In computational linguistics, researchers have always shown interest in the task of automatic recognition of causal relations because success on this task is critical for various natural language applications (Girju, 2003;Chklovski and Pantel, 2004;Radinsky and Horvitz, 2013).
Following the successful employment of linguistic features for various tasks (e.g., part-ofspeech tagging, named entity recognition, etc.), initially NLP researchers proposed approaches relying mainly on such features to identify causality (Girju, 2003;Bethard and Martin, 2008;Sporleder and Lascarides, 2008;. However, researchers have recently shifted their attention from these features and tried to consider other sources of knowledge for extracting causal relations (Beamer and Girju, 2009;Riaz and Girju, 2010;Do et al., 2011;Riaz and Girju, 2013). For example, Riaz and Girju (2010) and Do et al. (2011) have proposed unsupervised metrics for learning causal dependencies between two events. Do et al. (2011) have also incorporated minimal supervision with unsupervised metrics. For a pair of events (a, b), their model makes the decision of cause or non-cause relation based on unsupervised co-occurrence counts and then improves this decision by using minimal supervision from the causal and non-causal discourse markers (e.g., because, although, etc.).
In search of novel and effective types of knowledge to identify causation between two verbal events, Riaz and Girju (2013) have proposed a model to learn a Knowledge Base (KB c ) of verbverb pairs. In this knowledge base, the English language verb-verb pairs are automatically classified into three categories: (1) Strongly Causal, (2) Ambiguous and (2) Strongly Non-Causal. The Strongly Causal and Strongly Non-Causal categories contain verb-verb pairs with highest and least tendency to encode causality, respectively and rest of the verb-verb pairs are considered ambiguous with tendency to encode both types of relations. They claim that this knowledge base of verb-verb pairs is a rich source of causal associations. The incorporation of this resource into a causality detection model can help identifying causality with better performance. In this research, we also try to go beyond the scope of shallow linguistic features and identify additional interesting types of knowledge for the current task.

Computational Model for Identifying Causality
In this section, we introduce our model for identifying causality encoded by verb-noun pairs. Specifically, we extract all main verbs and noun phrases from a sentence and predict cause or noncause relation on verb-noun phrase (v-np) pairs. In order to make task easier, we consider only those v-np pairs where v (verb) is grammatically connected to np (noun phrase). We assume that a v and np are grammatically connected if there exists a dependency relation between them in the dependency tree. We apply a dependency parser (Marneffe et al., 2006) to identify such dependencies. Our model first employs a supervised classifier relying on linguistic features to make binary predictions (i.e., does a verb-noun phrase pair encode a cause or non-cause relation?). We then incorporate additional types of knowledge on top of these binary predictions to improve performance.

Supervised Classifier
In this section, we propose a basic supervised classifier to identify causation encoded by v-np pairs. To set up this supervised classifier, we need a training corpus of instances of v-np pairs encoding cause and non-cause relations. For this purpose, we employ the annotations of FrameNet project (Baker et al., 1998) provided for verbs. For example, consider the following annotation from FrameNet for the verb "dying" with argument "solvent abuse" where the pair "dyingsolvent abuse" encodes causality. A campaign has started to try to cut the rising number of children dying [ cause from solvent abuse]. To generate a training corpus, we collect annotations of verbs from FrameNet s.t. the annotated element (aka. frame element) is a noun phrase. For example, we get a causal training instance of "dying-solvent abuse" pair from the above annotation. We assume that if a FrameNet's annotated element contains a verb in it then this may not represent a training instance of v-np pair. For example, we do not consider the following annotation in our training corpus where causality is encoded between two verbs i.e., "died-fell".
A fitness fanatic died [ cause when 26 stone of weights fell on him as he exercised].
After extracting training instances from FrameNet, we assign them cause (c) and noncause (¬ c) labels. We manually examined the inventory of labels of FrameNet and use the following scheme to assign the c or ¬c to each training instance. All the annotations of FrameNet with following labels are considered as causal training instances and rest of the annotations are considered as non-causal training instances.
Purpose, Internal cause, Result, External cause, Cause, Reason, Explanation, Required situation, Purpose of Event, Negative consequences, resulting action, Effect, Cause of shine, Purpose of Goods, Response action, Enabled situation, Grinding cause, Trigger For this work, we have acquired 2, 158 (65, 777) cause (non-cause) training instances from FrameNet. Since, the non-cause instances are very large in number, our supervised model tends to assign non-cause labels to almost all instances. Therefore, we employ equal number of cause and non-cause instances for training. In future, we plan to extract more annotations from the FrameNet and employ more than one human annotators to assign the labels of cause and noncause relations to the full inventory of labels of FrameNet.
• Lexical Features: verb, lemma of verb, noun phrase, lemma of all words of noun phrase, head noun of noun phrase, lemmas of all words between verb and head noun of noun phrase. • Semantic Features: We adopted this feature from Girju (2003) to capture the semantics of nouns. The 9 noun hierarchies of WordNet i.e., entity, psychological feature, abstraction, state, event, act, group, possession, phenomenon are used as this feature. Each of these hierarchies is set to 1 if any sense of the head noun of noun phrase lies in that hierarchy otherwise set to 0. • Structural Features: This feature is applied by considering both subject (i.e., sub in np) and object (i.e., obj in np) of a verb. For example, for a v-np pair the variable sub in np is set to 1 if the subject of v is contained in np, set to 0 if the subject of v is not contained in np and set to -1 if the subject of v is not available in the instance. The subject and object of a verb are its core arguments and may sometime be part of an event represented by a verb. Therefore, these argument may have high tendency to encode non-cause relations.
We set up the following integer linear program after acquiring predictions of c and ¬ c labels using our supervised classifier.
Here L 1 = {c, ¬c}, I is the set of all instances of v-np pairs and x 1 (v-np, l) is the decision variable set to 1 only if the label l ∈ L 1 is assigned to v-np. The Equation 2 constraints that only one label out of |L 1 | choices can be assigned to a v-np pair. The equation 3 requires x 1 (v-np, l) to be a binary variable. Specifically, we try to maximize the objective function Z 1 (equation 1) which assigns the label cause or non-cause to all v-np pairs (i.e., set the variables x 1 (v-np, l) to 1 or 0 for all l ∈ L 1 and for all v-np pairs in I) depending on the probabilities of assignment of labels (i.e., P (v-np, l)) 1 . These probabilities can be obtained by running a supervised classification algorithm (e.g., Naive Bayes and Maximum Entropy). In our experiments, we provide results using the following probabilities acquired with Naive Bayes.
where f k is a feature, n is total number of features and P (f k | l) is the smoothed probability of a feature f k given the training instances of label l.

Knowledge of Semantic classes of nouns
Philospher Jaegwon Kim (Kim, 1993) (as cited by Girju and Moldovan (2002)) pointed out that the entities which represent either causes or effects are often events, but also conditions, states, phenomena, processes, and sometimes even facts. Therefore, according to this our model should have knowledge of the semantic classes of noun phrases with high tendency to encode cause or non-cause relations. Considering this type of knowledge, we can automatically review and correct the wrong predictions made by our basic supervised classifier.
We argue that if a noun phrase represents a named entity then it can have least tendency to encode causal relations unless there is a metonymic reading associated with it. For example, consider the following cause and non-cause examples where noun phrase is a named entity. 4. Sandy hit Cuba as a Category 3 hurricane. 5. Almost all the weapon sites in Iraq were destroyed by the United States.
In example 4, Cuba is location and does not encode causality. However, in example 5 the pair "destroyed-the United States" encode causality where a metonymic reading is associated with the location. We apply Named Entity Recognizer (Finkel et al., 2005) and assume if a noun phrase is identified as a named entity then its corresponding verb-noun phrase pair encodes noncause relation. This constraint can lead to a false negative prediction when the metonymic reading is associated with a noun phrase. In order to avoid as much false negatives as possible, we imply the following simple rule i.e., if one of the following cue words appear between a verb and a noun phrase then do not apply the constraint stated above.
by, from, because of, through, for In our experiments, the above simple rule helps avoiding some false negatives but in future any subsequent improvement with a better metonomy resolver (Markert and Nissim, 2009) should improve the performance of our model.
In addition to named entities, there can be various noun phrases with least tendency to encode causation. Consider the following example, where "city" is a location and does not encode causeeffect relation with the verb "remained".
Substantially fewer people remained in the city during the Hurricane Ivan evacuation.
In this work, we identify the semantic classes of noun phrases which do not normally represent events, conditions, states, phenomena, processes and thus have high tendency to encode non-cause relations. For this purpose, we manually examine the inventory of labels assigned to noun phrases in FrameNet (see table 1) and classify these labels into two classes (c n p and ¬c np ). Here, the class c np (¬c np ) represents the labels of noun phrases with high (less) tendency to encode cause-effect relations. For example, the label "Place" ∈ ¬c np (see table 1) represents a location and it may have least tendency to encode causality if metonymy is not associated with it. Using the classification of frame elements in table 1, we obtain the annotations of noun phrases from FrameNet and categorize these annotations into c np and ¬c np classes. On top of the annotations of these two semantic classes, we build a supervised classifier for predicting c np or ¬c np label for the noun phrases. After obtaining predictions, we select all noun phrases lying in class ¬c np and apply the same constraint stated above for the named entities. We use the following set of features to set up a supervised classifier for c np and ¬c np labels. We have acquired 23,334 (81,279) training instances of c np (¬c np ) class, respectively for this work. We also use WordNet to obtain more training instances of these classes. We follow the approach similar to Girju and Moldovan (2002) and adopt some senses of WordNet (shown in table 1) to acquire training instances of noun phrases. For example, considering the table 1, we assign ¬c np label to any noun whose all senses in WordNet lie in the semantic hierarchy originated by the sense {time period, period of time, period}. Following this scheme, we extract instances of nouns and noun phrases from English GigaWord corpus and assign the labels c np and ¬c np to them by employing WordNet senses given in table 1. Girju and Moldovan (2002) have used similar scheme to rank noun phrases according to their tendency to encode causation. In comparison to them, we use the WordNet senses to increase the size of our training set of noun phrases obtained using FrameNet above. In addition to this, we build a automatic classifier on the training data obtained using labels of FrameNet and WordNet senses to classify noun phrases of test instances into two semantics classes (i.e., c np and ¬c np ). In our training corpus of there are 2, 214, 68 instances of noun phrases (50% belongs to each of c np and ¬c np classes).
We incorporate the knowledge of semantics of nouns in our model by making the following additions to the integer linear program introduced in section 3.1.
Here L 2 = {c np , ¬c np } and M is the set of instances of those v-np pairs for which we consider the possibility of attachment of metonymic reading with np, x 2 (np, l) is the decision variable set to 1 only if the label l ∈ L 2 is assigned to np. The Equation 6 constraints that only one label out of |L 2 | choices can be assigned to a np. The equation 7 requires x 2 (np, l) to be a binary variable. The constraint 8 assumes that if an np belongs to the semantic class ¬c np then its corresponding pair v-np is assigned the label ¬c. We maximize the objective function Z 2 (equation 5) of our integer linear program subject to the constraints introduced above. We predict the semantic class of a noun phrase using the supervised classifier for c np and ¬c np classes and set the probabilities i.e., P (np, l) = 1, P (np, {L 2 } − {l}) = 0 if the label l ∈ L 2 is assigned to np. Again we use Naive Bayes to predict the labels for noun phrases. Also before running this supervised classifier, we run the named entity recognizer and assign ¬c np labels to all noun phrases identified as named entities. For our model, we apply named entity recognizer for seven classes i.e., LOCATION, PER-SON, ORGANIZATION, DATE, TIME, MONEY, PERCENT (Finkel et al., 2005).

Knowledge of Semantic classes of verbs
In this section, we introduce our method to incorporate the knowledge of semantic classes of verbs to identify causation. Verbs are the components of language for expressing events of various types. In TimeBank corpus, Pustejovsky et al. (2003) et al. (2003), the reporting events describe the action of a person, declare something or narrate an event e.g., the reporting events represented by verbs say, tell, etc. Here, we argue that a reporting event has the least tendency to encode causation because such an event only describes or narrates another event instead of encoding causality with it. We assume that the verbs representing reporting events have least tendency to encode causation and thus their corresponding v-np pairs have least tendency to encode causation. To add this knowledge to our model, we consider two classes of verbs i.e., c v and ¬c v where the class c v (¬c v ) contains the verbs with high (less) tendency to encode causation. Using above argument we claim that all verbs representing reporting events belong ¬c v class and verbs representing rest of the types of events belong to c v class. We build a supervised classifier which automatically classifies verbs into c v and ¬c v classes. We extract the instances of verbal events (i.e., verbs or verbal phrases) from TimeBank corpus and assign the labels c v and ¬c v to these instances. Using these labeled instances, we build a supervised classifier by adopting the same set features as introduced in Bethard and Martin (2006) to identify semantic classes of verbs. Due to space constraint, we refer the reader to Bethard and Martin (2006) for the details of features. Again we use Naive Bayes to take predictions of c v and ¬c v labels and their corresponding probabilities using equation 4. We incorporate the knowledge of semantics of verbs in our model by making the following additions to the integer linear program.
is the decision variable set to 1 only if the label l ∈ L 3 is assigned to v. The Equation 10 constraints that only one label out of |L 3 | choices can be assigned to a v. The equation 11 requires x 3 (v, l) to be a binary variable. The constraint 12 assumes if a verb v belongs to the class c v (i.e., has least potential to encode causation) then its corresponding pair v-np encodes non-causality. The constraint 12 enforces that if a verb v belongs to the class ¬c v then its corresponding v-np pair is assigned the label ¬c. Similarly, the constraint 16 enforces that if a v-np pair encodes causality then its verb v has potential to encode causal relation. We maximize the objective function Z 3 subject to the constraints introduced above.

Knowledge of Indistinguishable Verb and Noun
As introduced earlier, each causal relation is characterized by two roles i.e., cause and its effect. In order to encode causal relation, two components of an instance of verb-noun phrase pair need to represent distinct events, processes or phenomena. Employing simple lexical matching, we determine if a verb and a noun phrase represent same event or not as follows: • We use NOMLEX (Macleod et al., 2009) to transform a verb into its corresponding nominalization and use the following text segments for lexical matching.
T v = [Subject] verb [Object] 2 T n = Head noun of noun phrase • We remove stopwords and duplicate words from T v and T n and take lemmas of all words. If the subject or object or both arguments are contained in noun phrase then we remove these arguments from T v . We determine the probability of a verb (v) and a noun phrase (np) representing same event as follows. If head noun (i.e., T n ) lexically matches with any word of T v then set P(v ≡ np) to 1 and 0 otherwise. We assign non-cause relation if P(v ≡ np) = 1. Next, we incorporate the knowledge of indistinguishable verb and noun in our model using the following additions to our integer linear program.
Here L 4 = {≡, ≡} where the label ≡ ( ≡) represents same (distinct) events, x 4 (v-np, l) is the decision variable set to 1 only if the label l ∈ L 4 is assigned to v-np. The Equation 15 constraints that only one label out of |L 4 | choices can be assigned to a v-np pair. The equation 16 requires x 4 (v-np, l) to be a binary variable. The constraint 16 enforces that if a v-np pair belongs to the class ≡ then this pair is assigned the label ¬c. We maximize the objective function Z 4 subject to the constraints introduced above.

Evaluation and Discussion
In this section we present the experiments, evaluation procedures, and a discussion on the results achieved through our model for the current task.
In order to evaluate our model, we generated a test set with instances of form verb-noun phrase where the verb is grammatically connected to the noun phrase in an instance. For this purpose, we collected three wiki articles on the topics of Hurricane Katrina, Iraq War and Egyptian Revolution of 2011. We selected first 100 sentences from these articles and applied part-of-speech tagger (Toutanova et al., 2003) and dependency parser (Marneffe et al., 2006) on these sentences. Using each sentence, we extracted all verb-noun phrase pairs where the verb has a dependency relation with any word of noun phrase. We manually inspected all of the extracted instances and removed those instances in which a word had been wrongly classified as a verb by the part-of-speech tagger. There are total 1106 instances in our test set. We assigned the task of annotation of these instances with cause and non-cause relations to a human annotator. Using manipulation theory of causality (Woodward, 2008), we adopted the annotation guidelines from Riaz and Girju (2010) which is as follows: "Assign cause label to a pair (a, b), if the following two conditions are satisfied: (1), a temporally precedes/overlap b in time, (2) while keeping as many state of affairs constant as possible, modifying a must entail predictably modifying b. Otherwise assign non-cause label. " We have 149 (957) cause (non-cause) instances in our test set 3 , respectively. We evaluate the performance of our model using F-score and accuracy evaluation measures (see table 2 for results).
The results in table 2 reveal that the basic supervised classifier is a naive model and achieves only 27.27% F-score and 46.47% accuracy.
The addition of novel types of knowledge introduced in section 3 (i.e., the model Basic+SCN M +SCV+IVN) brings 14.74% (29.57%) improvements in F-score (accuracy), respectively. These results show that the knowledge of semantics of nouns and verbs and the knowledge of indistinguishable verb and noun are critical to achieve performance. The maximum improvement in results is achieved with the addition of semantic classes of nouns (i.e., Basic+SCN M ). The consideration of association of metonymic readings using model Basic+SCN M helps us to maintain recall as compared with SCN !M and therefore brings better F-score.
One can notice that almost all models suffer from low precision which leads to lower F-scores. Although, our model achieves 14.58% increase in precision over basic supervised classifier, the lack of high precision is still responsible for lower F-  Table 2: This table presents results of the basic supervised classifier (i.e., Basic) and the models after incrementally adding the knowledge of semantic classes of nouns without consideration of metonymic readings (i.e., + SCN !M ), the knowledge of semantic classes of nouns with consideration of metonymic readings (i.e., + SCN M ), the knowledge of semantic classes of verbs (i.e., +SCN M +SCV) and the knowledge of indistinguishable verb and noun (i.e., +SCN M +SCV+IVN).
score. The highly skewed distribution of test set with only 13.47% causal instances results in lots of false positives. We manually examined false positives to determine the language features which may help us reducing more false positives without affecting F-score. We noticed that the direct objects of the verbs are mostly part of the event represented by the verbs and therefore encodes noncausation with the verbs. For example, consider following instances: 6. The hurricane surge protection failures prompted a lawsuit. 7. They provided weather forecasts.
In example 6, "lawsuit" is the direct object of the verb "prompted" and is part of the event represented by the verb "prompt". However there is a cause relation between "protection failures" and "prompted". Similarly in example 7, the direct object "forecasts" is part of the "providing" event and thus the noun phrase "weather forecasts" encode non-cause relation with the verb "provide". Therefore, following this observation we employed the training corpus of cause and noncause relations (see section 3.1) and learned the structure of verb-noun phrase pairs encoding noncause relations most of the time. We considered only those training instances where the subject and/or object of the verb was available. For the current purpose, we picked up following four features (1) sub in np, (2) !sub in np, (3) obj in np and (4) !obj in np. Just to remind the reader, the feature sub in np (!sub in np) is set to 1 if the subject of the verb is (not) contained in the noun phrase np, respectively. For each of the above four features, the percentage of cause and entropy of relations with that feature are as follows: There are two important observations from above scores: (1) verbs mostly encode non-cause relations with their objects and subjects (i.e., high %¬c with obj in np and sub in np), (2) among obj in np and sub in np features, obj in np yields least entropy i.e., there are least chances of encoding causality of a verb with its object.
Considering the above statistics, we enforce the constraint on each verb-noun phrase pair that if the object of the verb is contained in the noun phrase of the above pair then assigns non-cause relation to that pair. Using this constraint, we obtain 46.61% (80.74%) F-score (accuracy), respectively. This confirms our observation that the object of a verb is normally part of an event represented by the verb and thus it encodes non-cause relation with the verb.
In this research, we have utilized novel types of knowledge to improve the performance of our model. In future, we need to consider more additional information (e.g., predictions from metonymy resolver) to achieve further progress.

Conclusion
In this paper, we have proposed a model for identifying causality in verb-noun pairs by employing the knowledge of semantic classes of nouns and verbs and the knowledge of indistinguishable noun and verb of an instance along with shallow linguistic features. Our empirical evaluation of model has revealed that such novel types of knowledge are critical to achieve a better performance on the current task. Following the encouraging results achieved by our model, we invite researchers to investigate more interesting types of knowledge in future to make further progress on the task of recognizing causality.