Light verb constructions with ‘do’ and ‘be’ in Hindi: A TAG analysis

In this paper we present a Lexicalized Feature-based Tree-Adjoining Grammar analysis for a type of nominal predicate that occurs in combination with the light verbs “do” and “be” (Hindi kar and ho respectively). Light verb constructions are a challenge for computational grammars because they are a highly productive predicational strategy in Hindi. Such nominals have been discussed in the literature (Mohanan, 1997; Ahmed and Butt, 2011; Bhatt et al., 2013), but this work is a first attempt at a Tree-Adjoining Grammar (TAG) representation. We look at three possibilities for the design of elementary trees in TAG and explore one option in depth using Hindi data. In this analysis, the nominal is represented with all the arguments of the light verb construction, while the light verb adjoins into its elementary tree.


Introduction
Lexical resource development for computational analyses in Hindi must contend with a large number of light verb constructions. For instance, in the Hindi Treebank (Palmer et al., 2009), nearly 37% of the predicates have been annotated as light verb constructions. Hence, the combination of a noun with a light verb is a productive predicational strategy in Hindi. For example, the noun yaad 'memory' combines with kar 'do' to form yaad kar 'remember'.
In light verb constructions, the noun is a predicating element along with the light verb. The presence of two predicating elements representing a single meaning is a challenge for a linguistic theory that maps between syntax and semantics. Consequently, the argument structure representation for light verb constructions (LVC) has resulted in two opposing views in syntactic theory. One view supports a noun-centric analysis of the LVC, where the noun is represented with all the arguments of the LVC e.g. (Grimshaw and Mester, 1988;Kearns, 1988). The light verb's only role is to theta-mark the arguments of the LVC, without any semantic contribution. The second view proposes argument sharing between the noun and the light verb as they both contribute to the argument structure of the LVC (Butt, 1995;Ahmed et al., 2012). We refer to such analyses as verb-centric analyses.
Within the framework of this debate, we propose to use Lexicalized Feature-based Tree Adjoining Grammar, which is a variant of Tree Adjoining Grammar (TAG). TAG has been used to represent light verb constructions in French (Abeillé, 1988) and Korean (Han and Rambow, 2000). The primitive structures of TAG are its elementary trees, which encapsulate the syntactic and semantic arguments of its lexical anchor (for a light verb construction, the noun and light verb respectively will be the anchors). The association of a structural object with a linguistic anchor allows TAG to specify all the linguistic constraints associated with the anchor over a local domain. This is especially advantageous for composing the complex argument structure of a LVC. In comparison with other formalisms (e.g. context-free grammars), this property gives TAG an extended domain of locality.
In this paper, we look at a particular group of nouns that occur with light verbs 'do' and 'be' (kar and ho) as part of a light verb construction. The same noun alternates with either light verb, resulting in a change in the argument structure of the verb. For example the noun chorii 'theft' can occur as either chorii kar 'theft do' or chorii ho 'theft happen'. There are nearly 265 nouns showing this alternation in the Hindi Treebank (Palmer et al., 2009) 1 . These constitute about 15% of the total light verb constructions in the Treebank. Note that other light verbs also occur in Hindi e.g. de 'give', le 'take' etc. but they are not part of this study.
Section 3 has some examples of these predicating nominals. Before this, Section 2 will introduce the TAG formalism. Section 4 describes the design of the elementary trees that are the basis of the analysis and in the final section we summarize our findings and make suggestions for future work.

Lexicalized Feature based Tree Adjoining Grammar
Tree-Adjoining Grammar (TAG) is a formal tree-rewriting system that is used to describe the syntax of natural languages (Joshi and Schabes, 1997). The basic structure of a TAG grammar is an elementary tree, which is a fragment of a phrase structure tree labelled with both terminal and non-terminal nodes. The elementary trees are combined by the operations of substitution (where a terminal node is replaced with a new tree) or adjunction (where an internal node is split to add a new tree).
The elementary trees in TAG can be enriched with feature structures (Vijay-Shanker and Joshi, 1988). These can capture linguistic descriptions in a more precise manner and also capture adjunction constraints. TAG with feature structures is also known as FTAG (Feature-structure based TAG). A TAG can also be lexicalized i.e., an elementary tree has a lexical item as one of its terminal nodes. Lexicalized TAG enhanced with feature structures is known as Lexicalized Feature-based Tree-Adjoining Grammar (LF-TAG). This has been used for developing computational grammars for English (XTAG-Group, 2001), French (Abeillé and Candito, 2000) and Korean . In our analysis, we will also use LF-TAG, but we will refer to it as LTAG for convenience. Figure 1 shows the basic steps for composing elementary trees containing feature structures. Each node has a top and a bottom feature structure. Features can be shared among nodes in an elementary tree. In the tree for the verb running, the variable 1 is used to show that the verb must share the same features as the subject NP.
The tree for running is an initial tree with a single terminal for its argument noun phrase (NP). The tree for is, on the other hand, is a special type of elementary tree called the auxiliary tree. It has a foot node (marked with an asterisk), which is identical to its root node. The auxiliary tree will adjoin into the tree for running at the VP node only. The top and bottom feature structures for MODE at the VP node, have different values (ind and ger), and they cannot unify. This captures an adjunction constraint for obligatory adjunction and requires adjunction to take place at this node only.
During adjunction, the top of the root of the auxiliary tree (for is) will unify with the top of the adjunction site. The bottom of the foot of the auxiliary tree will unify with the bottom of the adjunction site. During substitution, the top node in the tree for Jill unifies with the node at NP.
This results in the second tree in Figure 1, post the operations of substitution and adjunction. In a final derivation step, top and bottom feature structures at each node will unify, to give the final derived tree with a single feature structure at each node. The resulting tree is called a derived tree, but another by-product of the TAG analysis is also the derivation tree. This tree has numbered node labels that record the history of composition of the elementary trees. For example, the tree for Jill is running can be seen in Figure 2. The root of this tree is labelled with running, which is an initial tree of the type S.
An important characteristic of lexicalized elementary trees is their correspondence with that lexical item's predicate-argument structure. This has sometimes been formalized as the PACP (Predicate-Argument Co-occurrence Principle) (Frank, 2002). The PACP restricts the structure of the elementary trees such that they may not be drawn arbitrarily. At the same time, lexicalized TAGs will often have the is After substitution and adjunction:

Data
In this section, we introduce the nominal predicates that will be the focus of our LTAG analysis. Such nominals allow an agentive (ergative-marked 2 ) subject with the light verb kar 'do'. In contrast, the same nominal does not have an agentive subject with ho 'be' (Ahmed and Butt, 2011). The alternation with ho 'be' has an intransitivizing effect. In (1) and (2), a change in the light verb results in the presence or absence of the agent argument. The nominal chorii is the same, but the LVC in (1) requires only a Theme argument, whereas (2) needs an Agent and a Theme.
( In English, a similar alternation structure may be found with light verbs in bring to light vs. come to light (Claridge, 2000). Here, two light verbs bring and come are used to express either a causative or inchoative reading. In the Hindi examples, the light verb ho 'be' and the light verb kar 'do' are used to express the inchoative vs. causative reading. In Persian, kardan 'make or do' andŝodan 'become' are used in a manner similar (although not identical) to Hindi.
The noun chorii 'theft' belongs to a particular class of nouns where a change in the light verb does result in a change in the arguments, but the agent argument is always presupposed, irrespective of the light verb. For instance, the addition of a phrase such as apne-aap 'on its own' is semantically odd with example 1. This is because the event of 'theft' cannot occur without an agent, although it is unexpressed with the light verb ho 'be'. Contrast this with 4, where apne-aap is not odd and where the alternation with kar 'do' is not possible. The non-alternating noun afsos 'regret' occurs with an Experiencer subject, which can act spontaneously and hence allows the use of apne-aap.
( 3)  In order to model such nominals in TAG we have three options: first, a noun-centric analysis, where the nominal projects all the arguments of the LVC. In reference to the examples above, this would imply that the light verb chorii 'theft' would be represented by two trees-i.e., it would appear with two arguments with kar 'do' and only one with ho 'be'.
The second option is a verb-centric analysis, where the light verb kar 'do' would contribute the agentive argument, and chorii would contribute the object. The nominal's elementary tree would consist of only one argument, regardless of whether it combined with kar 'do' or ho 'be'. The third option is to represent the LVC chorii kar 'theft do; steal' as the anchor of a single elementary tree-a single multiword expression. While the first two options are worth exploring, we discard the third option for two reasons: first, the LVC is highly productive in Hindi, which would imply that this would result in too many elementary trees in the grammar. Second, there is evidence that the LVC forms a phrasal category in the syntax (Mohanan, 1997;Davison, 2005). This means that individual components of the LVC may be moved away from each other, emphatic particles or negation may intervene and the noun component may be independently modified by an adjective. Therefore, the multi-word option would not be the best approach here. This is in contrast to previous TAG analyses for English LVCs where both nominal and verb are anchored in the same elementary tree (XTAG-Group, 2001). Figure 3 shows the derivation trees (cf. Figure 2) for the three different analysis options as described above for the sentence Ram ne gehene chorii kiye 'Ram stole the jewels'. The  In this paper we explore a noun-centric analysis of Hindi LVCs. 3 In the analysis that follows, we will describe two elementary trees for a noun like chorii i.e., when it combines with either ho 'be' or kar 'do'. Making the elementary structures richer and more complex increases ambiguity locally and we then have more descriptions for the same lexical item. But these structures also capture local dependencies i.e., the fact that the lexical item can appear in varying linguistic environments. Second, this is in keeping with the TAG notion of using complex elementary structures to capture linguistic properties and having very general operations (substitution and adjunction) to combine these structures. This has been used effectively in computational applications and is characterised by the slogan complicate locally, simplify globally (Bangalore and Joshi, 2010).

Analysis
In a noun-centric analysis, the light verb does not have arguments of its own. The full array of arguments for the light verb construction is instead represented in the nominal's tree. The light verb can only choose the semantic property of the nominal it may combine with (e.g., the light verb ho may combine only with nominals that have no agentive arguments). Other analyses e.g Ahmed et al. (2012) represent the light verb kar 'do' with arguments of its own. We discuss this in Section 5.
Our work follows Han and Rambow (2000)'s representation of Sino-Korean LVCs. This work has also proposed separate trees for the nominal and light verb. The elementary tree of the nominal is an an initial tree, and as it is considered the true predicate, it also chooses a syntactic structure that will realize all its arguments. The light verb on the other hand is represented as an auxiliary tree, therefore it is an adjunct to the nominal's basic structure. However, as it is a predicate, it is also a special type of auxiliary tree viz., a predicative auxiliary tree (Abeillé and Rambow, 2000).
The second feature of this analysis, also based on Han and Rambow (2000)'s work is the idea of the nominal as an underspecified base form. The nominal's elementary tree is not specified with respect to its category, rather, we use the label X, which projects to an XP. We also assume, following Han and Rambow that each node is specified with the feature CAT which has values like V or N, but the [CAT=N] feature on the noun is not realized unless the light verb composes with the elementary tree of the nominal. In addition, although the nominal is not a verb, it has the feature TENSE=-i.e., it is not tensed.

The light verb
In order to model the light verb kar 'do' in Example 2, we will construct an auxiliary tree with feature structures, anchored at kar 'do'. Figure 4 shows such an elementary tree. Note that this is a very different tree from 'full' kar 'do', which will have all its arguments. The light verb kar is inflected for person, number, and gender as well as tense and aspect. In this particular example, it is tensed, masculine, plural and has perfective aspect; therefore it appears as kiye. We assume that morphological analysis has already taken place in a separate module, such that the correct morphological surface form has been derived for 'do, masculine plural perfective'. In Figure 4, the XP r (root) node and its right-branching daughters are [CAT=V] with linguistic information about gender, number, tense and aspect. The feature AGT=+ at the top node implies that this auxiliary tree needs to unify with an initial tree that is also [AGT=+]. In contrast with kar 'do', the auxiliary tree of the light verb ho 'be' will have [AGT=-].  The XP f (foot) node has [TENSE=-] and [CAT=N], which will enable it to adjoin into the elementary tree of a nominal. The CASE value is specified as NOM (nominative) as the light verb will assign nominative case to the noun. The NAGR feature is required when the light verb agrees in number and gender with the predicative nominal itself (Mohanan, 1997). As this will not occur in the examples we are working with, the value for NAGR is negative. For other 'standard' cases of agreement, the feature AGR is used (It is also useful to note that the verbal agreement rule in Hindi differs from English as the verb agrees with the highest nominative marked argument-and not necessarily the subject (Mohanan, 1995) Figure 5: Tree for nominal chorii 'theft' -agentive, as seen in Ram ne gehene chorii kiye "Ram stole the jewels". The feature clash at XP 2 is marked with a box.

The nominal
In contrast to the impoverished argument structure of the light verb, the nominal in Figure 5 has the full array of arguments for chorii 'theft'. The tree is anchored by the lexical item chorii and the non terminals at NP 1 and NP 2 are marked with a ↓ for substitution with the actual lexical items.
The position of the arguments roughly follows the configuration described in Bhatt et al. (2013, p. 59) , where the first position is the ergative-marked argument and is found in a transitive sentence (but only if the property [PERF=+] is also present.) The 'second' position is one where the object of the transitive verb is found. In Figure 5, this is represented as NP 2 and is the nominative marked argument. The elementary tree for the nominal is not complete, because of the feature clash at XP 2 between [TENSE=+] vs. [TENSE=-]. The feature clash represents an obligatory adjunction constraint which will require the light verb to adjoin at this node.
The first position in Figure 5 has the features for [PERF=+] and [AGT=+] as a consequence of having [CASE=ERG]. The agentive argument shares the values for PERF and AGT with the S node. This ensures that the light verb that adjoins into this tree will match the PERF and AGT values in NP 1 . The argument in second position NP 2 will share its values for AGR with XP 2 . At XP 2 , the values for PERF, AGT and AGR should match with the root node of the light verb. Otherwise, adjunction will fail.
The light verb's tree as shown in Figure 4 will adjoin into the tree of the nominal. Post adjunction and substitution, we find a composed structure as seen in Figure 7.
The same noun chorii 'theft' may combine with the light verb ho. In that case, non-agentive chorii will choose an elementary tree such as Figure 6. This elementary tree appears without an agentive argument. Its single nominative Theme argument has moved to the first position at NP 1 , leaving behind a co-indexed trace. Figure 6 shows that the site of adjunction into chorii 'theft' (non-agentive) is at XP 1 . Adjunction cannot take place at XP 2 as the feature clash is higher up at XP 1 . The single nominative argument of chorii (non-agentive) will move up to NP 1 in order to receive nominative case from the node CAT=V (Note that the node immediately above NP 2 has an underspecified CAT feature and this requires the argument to move to a higher position). The tree for non-agentive chorii will always combine with a light verb that is AGT=-. Its Theme argument will take nominative case irrespective of the tense-aspect value of the verb.

Discussion
The elementary trees for chorii 'theft'-both agentive and non-agentive are able to capture its alternations with kar "do" and ho 'be'. This is in contrast to Ahmed et al. (2012)'s approach in an important way. They do not consider the nominal's alternation with the light verb ho "be" as a light verb construction. Instead, they maintain that it has a resultative reading and provide a different analysis within the Lexical Functional Grammar (LFG) framework. In fact, the alternation with ho "be" provides a useful lexical alternative to an alternative syntactic structure (such as a passive). The alternation of the light verb ho "be" and kar "do" is moreover a characteristic of a certain group of nominals only (not all can show this alternation e.g., intizar "waiting" cf. Ahmed and Butt (2011)). Therefore, we maintain that chorii ho "theft happen" is indeed a light verb construction. Ahmed and Butt (2011)'s analysis looks at the noun and light verb as co-predicators i.e., it is a verb centric analysis. While this is different from the proposed analysis here, it is not impossible to construct elementary trees where the light verb's elementary tree consists of one argument i.e., the subject and the nominal (with its own argument) adjoins into it. The pros and cons of these two approaches need to be explored more thoroughly within the TAG framework and we leave this to future work.
While this work has examined one class of nominals that occur as part of light verb constructions, it does not complete the analysis of light verb constructions in Hindi. The behaviour of other nominal classes remains to be explored. There are also nominals that occur with light verbs other than kar 'do' and ho 'be'. Finally, while the work presented here is mainly theoretical, it is in keeping with recent proposals for extracting a Hindi TAG grammar from a phrase structure treebank (Bhatt et al., 2012;Mannem et al., 2009). The algorithm in Bhatt et al. (2012) relies on the annotated Hindi Dependency Treebank and proposes a rule extraction system for elementary trees. Therefore, the description of Hindi LVCs in TAG would be a useful addition to the implementation of a grammar extraction task.