Light Verb Constructions and Their Families - A Corpus Study on German ‘stehen unter’-LVCs

The paper reports on a corpus study of German light verb constructions (LVCs). LVCs come in families which exemplify systematic interpretation patterns. The paper’s aim is to account for the properties determining these patterns on the basis of a corpus study on German LVCs of the type ‘stehen unter’ NP’ (‘stand under NP’).


Introduction: LVCs and their families
Light verb constructions (LVC) are a specific type of predicatively used multiword expressions. 1 A LVC consists of a semantically light verb and a phrasal element, e.g., a PP as in the German examples in (1).
(1) a. unter Beobachtung stehen 'be under observation' (lit. under observation stand) b. unter Schutz stehen 'be under protection ' (lit. under protection stand) The German LVCs in (1) consist of the light verb stehen 'stand' and a prepositional phrase (PP) headed by unter 'under'. Stehen is, according to Kamber (2008), one of the most frequently occurring light verbs in German. The English notion 'light verb' goes back to Jespersen (1942) who assumed that light verb are semantically empty. This position has been questioned by a number of authors (e.g. Isoda 1991;Brugman 2001;Butt 1995;Butt andGeuder 2001, 2003;Butt and Lahiri 2013;Fleischhauer and Neisani 2020) who insist that the light verb makes at least a subtle contribution to the LVC's overall meaning. This position also prevails in the German research tradition. von Polenz (1963)who introduced the corresponding German notion 'Funktionsverb' (lit. function verb)-recognized that light verbs contribute in terms of aktionsart features as well as causativity. Thus, the light verb is not semantically empty but only semantically reduced compared to its corresponding heavy uses.
In its heavy use (2-a) stehen expresses that its subject referent is spatially located in an upright posture; the spatial location is specified by the PPcomplement (see Gamerschlag et al. 2013 for a detailed discussion of German posture verbs). As a light verb, stehen does not express that its subject referent is being spatially located (2-b). Rather, the verb only contributes to the complex predicate's event structure. LVCs headed by stehen always express state predications (e.g. von Polenz, 1963(e.g. von Polenz, , 1987. (2) a. The PP-internal noun provides the LVC's main predicational content. The LVC in (2-b) expresses that the subject referent is in a state of shock; substituting the noun by e.g. Stress 'stress' results in a different predication. The LVC unter Stress stehen 'be stressed' (lit. under stress stand) expresses that the subject referent is in a state of stress.
Like simplex predicates, LVCs can be classified with respect to semantic features like aktionsart and causativity. These features have been systematically related to the light verb's lexical meaning (e.g. von Polenz 1963(e.g. von Polenz , 1987. But is has rarely been noticed that systematicity is also found on a semantically deeper level. The LVCs in (1) exem-plify a different interpretation pattern from those in (2). Following Nunberg et al. (1994), I use the label 'family' to designate LVCs which conform to the same interpretation pattern. The notion of a LVC-family is defined as follows (following Fleischhauer 2019, 32, Fleischhauer and Turus in press): (3) Light verb constructions form a family if (i) they only show variance with respect to their NP element, and (ii) they exemplify the same interpretational pattern.
The LVCs in (1) belong to a family I call 'event passive-family' since they are paraphrased by an event passive construction (so-called werden 'become'-passive). Unter Beobachtung stehen in (1-a) is paraphrased as 'beobachtet werden' ('be observed' lit. observed become). The two LVCs unter Schock stehen 'be shocked' (2-b) and unter Stress stehen 'be stressed' superficially look like the LVCs in (1) but resist an event passive paraphrase. Instead, they are paraphrased by a state passive construction (sein 'be' + passive participle). 2 Unter Schock stehen (2-b), for example, is paraphrased as 'geschockt sein' ('be shocked'; lit. shocked be). The current paper presents a first systematic case study of LVC-families. The central questions are: Which LVCs are members of these families? And, what are the characteristic properties of the members of the individual families? These questions have been explored on the basis of a corpus study.

Corpus study
For the corpus study on German stehen unter-LVCs, I used the Tagged-C2 archive of the German reference corpus (DeReKo). The archive basically contains newspaper articles and consists of 1.022.895.699 words organized in 4.491.138 texts. The corpus search has been carried out using the search engine COSMAS II.
I will start a brief discussion of the search criterion used for the corpus study and then proceed by discussing the individual annotation steps. The annotation has been independently done by two annotators, in case of disagreement a third annotator has been consulted.

Search criterion
LVCs cannot directly be identified within the German reference corpus. The reasons for this are twofold. First, LVCs cannot be distinguished from regular predicate-argument constructions on the basis of morphosyntactic criteria. The two sentences in (2) look superficially similar even though the second one contains a LVC. Some authors propose that LVCs can be distinguished from regular predicateargument constructions on the basis of the semantic type of the PP-internal noun. LVCs require an eventive noun in PP-internal position, whereas regular predicate-argument constructions do not (e.g. von Polenz, 1963Polenz, , 1987Engelen, 1968;Persson, 1994;Helbig, 1984Helbig, , 2006Langer, 2004Langer, , 2005Ježek, 2016;Savary et al., 2018). This criterion is refuted by some authors like, for example, Klein (1968); Herrlitz (1973); Schwall (1991); Rostila (2001); Hanks et al. (2006). In addition, the language data discussed in section 3 indicate that LVCs are not restricted to eventive nouns in PP-internal position but license, for example, artefact nouns as well.
Second, the individual components of a LVC can be separated by lexical material which does not belong to the MWE. In the interrogative sentence in (4), the subject NP intervenes between the light verb and the unter-PP. Nagy T. et al. (2020, 326) mention that a discontinuous realization of LVCs is particularly frequent in German (compared to e.g. English, Spanish and Hungarian); this is probably due to general constraints on German word order. Discontinuity is a challenging property for the identification of MWEs in general (e.g. Constant et al., 2017).

(4)
Steht der Verdächtige unter Beobachtung? 'Is the suspect under observation?' Given the mentioned difficulties in identifying LVCs, I searched for all occurrences of inflected stehen and the preposition unter realized within the same sentence (search string '&stehen \s0 unter'). This search criterion yielded 80255 hits of which 8023 sentences (approx. 10% of all hits) have been randomly collected for manual annotation. 55 sentences have been excluded from the annotation procedure since they were incomplete.
Although there exists substantive literature on the annotation of MWEs in general and of LVCs in particular (e.g. Krenn 2008;Tu and Roth 2011;Rácz et al. 2014;Savary et al. 2018;Nagy T. et al. 2020), these studies differ in scope from the present one. The present study is not concerned with LVCs in general or LVCs headed by a specific type of light verb but is directed at a specific combination of light verb and preposition. This allowed using more specific annotation criteria which were directly tailored for this type of construction.
LVC-families have not been the subject of corpus studies so far.

First annotation step
The unter-PP is a syntactic complement of stehen, both in the verb's light as well as heavy uses. In a first annotation step, we singled out those sentences in which the unter-PP is not realized as the verb's complement. The relevant test criterion is whether the PP can be left out without affecting the acceptability of the resulting sentence. If not, the PP is classified as being a complement of stehen. The results of the first annotation step are summarized in Table 1. PP complement PP not complement 5822 2146 The sentences in which the PP is not a complement of stehen were excluded from further analysis.

Second annotation step
The second annotation step consisted in distinguishing heavy from non-heavy uses of stehen. Nonheavy uses comprise light uses as well as what Fazly and Stevenson (2007, 10) term 'abstract uses'. As a heavy verb stehen can be substituted by other posture verbs (e.g. sitzen 'sit' or liegen 'lie') or by purely locational predicates like positioniert sein 'be positioned' or lokalisiert sein 'be localized'. In (5-a), stehen can be substituted by, for example sitzen or liegen and therefore is classified as a 'heavy' verb. The substitution of stehen by a different posture verb is unacceptable in (5-b). Accordingly, this use of stehen is classified as 'non-heavy'. The results of the second annotation step are summarized in Table 2. There is a clear preference for stehen in combination with the preposition unter to be used as a non-heavy verb.
heavy use non-heavy use 562 5260 The third annotation step has only been done with respect to the sentences classified as containing a non-heavy use of stehen.

Third annotation step
The final annotation step consisted in identifying LVC-families. Since the focus is on the two LVCfamilies introduced above, it was only checked whether the combination of light stehen and its PPcomplement is paraphrased by using a sentence containing an event passive or state passive construction. The two types of paraphrases have already been introduced in Section 1. As summarized in Table 3, 1335 occurrences require an event passive paraphrase and 1524 sentences are paraphrased by use of a state passive construction. The two LVC-families represent 49.23% of all non-heavy uses of stehen within the analyzed sample. event-passive paraphrase state-passive paraphrase 1335 1524 Table 3: Results of the third annotation step.
An example of a non-heavy use of stehen rejecting an event passive or state passive paraphrase is shown in (6). The construction unter dem Motto stehen (lit. under the motto stand) is paraphrased as 'have as its motto' which is not a passive paraphrase but a paraphrase expressing abstract (predicative) possession.
Based on the data of the third annotation step, the individual members of the two LVC-families have been identified. The event passive-family is represented by 33 different LVCs; for the the state passive-family 19 different members have been found. The full list of nouns occurring PPinternally in the two families is given in the appendix. With respect to the third annotation step, the two annotators have been in total agreement.

The semantic type of the PP-internal nouns
In a final step, all nouns occurring PP-internally were classified with respect to the type of object they are denoting. It was first checked whether the PP-internal nouns denote an eventuality. The notion 'eventuality' is used as a cover term for states and events (Bach, 1986). Eventuality-denoting nouns accept temporal (e.g. gestern 'yesterday') and aspectual modifiers (e.g. andauernd 'continuous') Fleischhauer and Neisani, 2020). Only five nouns (7-a) -all belonging to the state passive-family -reject temporal/aspectual modification. An example of a LVC containing the artefact noun Drogen 'drugs' is shown in (7-b). The example expresses that the subject referent is in a state induced by drugs (i.e. is influenced by drugs). With respect to the eventuality-denoting nouns, the two LVC-families show clear differences. The PPinternal nouns occurring in the event-passive family denote events, those occurring in the state-passive family are state-denoting. There exist a number of criteria which allow distinguishing event-denoting nouns from state-denoting ones (cf. . Only event-denoting nouns can be realized as the subject of predicates like geschehen/passieren 'happen', beenden 'stop/finish' and unterbrochen sein 'be interrupted'. For details concerning the criteria, the reader is referred to the mentioned literature. In Section 1, I introduced the LVC unter Beobachtung stehen 'be under observation' as a representative member of the event passive-family. The example in (8) demonstrates that the noun Beobachtung 'observation' can be realized as the subject of geschehen 'happen'. The noun is also licensed as the subject argument of the other mentioned predicates (not illustrated for reasons of space) and qualifies as being event-denoting. 'Probably, the observation happened with the help of a not too bad telescope" (http://www.vm2000.net/category/ausgabe-80/; 28.04.2021) The noun Schock 'shock' which occurs in the LVC unter Schock stehen 'be shocked' -a representative member of the state passive-family -shows a somewhat more variable behavior. Although Schock can be realized as the subject of geschehen, as shown in (9), it can neither be realized as the subject argument of beenden 'stop/finish' nor of unterbrochen sein 'be interrupted'. The cumulative evidence speaks in favor of classifying Schock as a statedenoting noun. The interpretational difference observed between the LVCs of the two families is not arbitrary but results form the specific meaning of the nouns licensed in PP-internal position. Eventdenoting nouns allow for an event-passive interpretation, state nouns result in a state passive read-ing. The artefact nouns Alkohol 'alcohol', Drogen 'drugs', Suchtmittel 'addictive substances', Beruhigungsmittel 'sedative' and Medikamente 'medicine' are associated with a specific state -the state of being intoxicated by the respective substance -and give rise for a state passive reading as well. This interpretation is not restricted to the use of these nouns within the mentioned LVC since it is also found without the light verb. The nouns Drogen 'drugs' and Alkohol 'alcohol' are conjoined with a PP headed by unter in (10) which is realized as an adjunct PP. Like in (7-b), the PP indicates that the subject referent has been under the influence of drugs and alcohol.
(10) Not only Drogen can be realized within an unter-PP without light stehen; the same is true of the other nouns occurring in the two families. This is a relevant observation as it demonstrated that the passive-like interpretation is only dependent on this specific use of the preposition unter 3 but neither on the light verb nor on the light verb construction as such. The basic function of the light verb is embedding the passive-like meaning expressed by the PP within a state predication.

Conclusion & Outlook
The paper started from the observation that LVCs instantiated by the same morphosyntactic typein our case 'stehen + unter' -are heterogeneous with respect to their interpretation. LVCs of this type exemplify (at least) two different systematic interpretation patterns (which have been termed 'families'). Both families share a passive-like interpretation which has been related to the specific use of the preposition unter. The differences between the two families have been related to the semantic type of the PP-internal nouns. The existence of LVC-families has (to the best of my knowledge) so far only been recognized for Persian LVCs (e.g. Family, 2006Family, , 2008Family, , 2011Family, , 2014 but it has been gone unnoticed for other languages (especially for German). It will be definitely worth investigating whether we come across similar or even the same LVC-families in other languages. A natural candidate to look at might be Dutch which -in difference to other languages as for example Turkish or Persian -shows a light use of a verb meaning 'stand'.
Another question to be investigated in the future is whether we can identify further characteristics with respect to which the mentioned LVC-families differ from each other. A promising feature to look at is causativization since it seems to be the case that the two families show different preferences in the choice of their causative light verb. LVCs of the event passive-family prefer stehen 'put' (lit. cause to stand), those of the state passive-family prefer setzen 'put' (lit. cause to sit). The results of a limited corpus study on the distribution of the two causative LVCs stellen and setzen are summarized in Table 4. The first two LVCs belong to the event passive-family, the second two LVCs are of the state passive-family. Each LVC has been individually searched for within the German reference corpus (search strings: '&stellen \s0 unter N' and '&setzen \s0 unter N'; 'N' has been replaced by the individual nouns.  Due to reasons of space, I cannot go into further details (especially with respect to the motivation of the different preferences) but take this as a promising starting point for a continuation study on the different families of stehen unter-LVCs.
Concerning further automation, we are planing to train learning algorithms on the basis of the annotated data set for the automatic identification of stehen-LVCs.