Paraphrasing Compound Nominalizations

A nominalization uses a deverbal noun to describe an event associated with its underlying verb. Commonly found in academic and formal texts, nominalizations can be difficult to interpret because of ambiguous semantic relations between the deverbal noun and its arguments. Our goal is to interpret nominalizations by generating clausal paraphrases. We address compound nominalizations with both nominal and adjectival modifiers, as well as prepositional phrases. In evaluations on a number of unsupervised methods, we obtained the strongest performance by using a pre-trained contextualized language model to re-rank paraphrase candidates identified by a textual entailment model.


Introduction
Nominalizations are widely used in academic and other genres of formal texts that adopt a compact and abstract writing style. A nominalization is a noun (e.g., "treatment") that is morphologically derived from a verb ("treat"), and that designates some aspects of the event referred to by the verb (Quirk et al., 1985). Because of the systematic correspondence between nominalization and clause structure, a noun phrase headed by a deverbal noun (e.g., "surgical treatment of fracture") often has a clausal paraphrase headed by the underlying verb ("the surgery treats fracture") (Cohen et al., 2008).
This paper aims to generate clausal paraphrases for nominalizations. A key requirement in this task is to map the arguments of the deverbal noun to those of the underlying verb. In the example above, the deverbal noun "treatment" has a prepositional phrase (PP) argument ("of fracture") that corresponds to the object of the verb "treat", while the prenominal modifier ("surgical") corresponds to the subject. This mapping varies according to the semantic context of the nominalization. An alternative nominalization, such as "fracture treatment with surgery", may have its PP argument ("with surgery") mapped to the subject, and its prenominal modifier ("fracture") mapped to the object. Table 1 shows a range of possible mappings.
Our task is distinguished from previous research in terms of both the input and output. In the field of nominalization disambiguation, most studies have focused on deverbal nouns with one argument, typically a nominal modifier (Lapata, 2002;Nicholson and Baldwin, 2008). We extend the scope by addressing both nominal and adjectival modifiers, as well as PP argument.
In terms of the output, most previous work assigned semantic role labels to arguments of the deverbal noun; in contrast, we generate paraphrases in predicate-argument form (Table 1). This approach has the advantage of avoiding commitment to a particular semantic theory, and can facilitate direct application to various NLP downstream tasks. The paraphrases may improve performance in machine translation, for example, for sentences that do not use nominalization in the target language. In information extraction, they can potentially expand query formulation to increase recall. In text simplification, the paraphrase may be easier to understand than the original nominalization for less proficient readers.
The rest of the paper is organized as follows. We define our task in the next section, and then review previous work in Section 3. After a description of the dataset in Section 4, we present our approach in Section 5 and discuss experimental results in Section 6.

Task Definition
As shown in Table 1, the input is a nominalization with two arguments. Specifically, the deverbal noun (N V ) is modified by the preceding noun (M n ) or adjective (M a ), and by a prepositional phrase with one prepositional object (O). Although our dataset does not include nominalizations with mul-  tiple prenominal and PP modifiers 1 , and other modifier types such as s-genitives and possessive pronouns 2 , our proposed algorithm can be extended in a straightforward manner to address them. The output is a clausal paraphrase, i.e., a predicate headed by a verb that corresponds to the deverbal noun N V . This verb may have a subject, object, or prepositional phrase corresponding to M a , M n or O. Our model assumes that every input has a paraphrase. In actual deployment, the system would need to determine whether an input can be paraphrased, a task that we leave to future work.

Previous work
Our research is most closely related to noun compound interpretation (Section 3.1) and nominal semantic role labeling (Section 3.2).

Noun compound interpretation
Research on noun compound interpretation has taken two main approaches. One approach assigns an abstract label to describe the relation between the head noun and the noun modifier (Tratz and Hovy, 2010). Another, similar to ours, generates a paraphrase that links the two nouns with prepositions and verbs (Butnariu et al., 2010;Nakov and Hearst, 2013;Ponkiya et al., 2020), or in a free-form paraphrase (Hendrickx et al., 2013). As unsupervised methods has recently been found to perform well in NCI (Ponkiya et al., 2020), we likewise pursue this direction.
Our work can be viewed as a special case in NCI, where the head noun is required to be a deverbal noun. Given the properties of the deverbal noun, the paraphrase in this paper takes a different form, making use of the underlying verb to form a clause.

Nominal semantic role labeling
Previous work on nominalization interpretation has mostly focused on nominal semantic role labeling (SRL), which assigns abstract labels (e.g., agent, patient) to arguments of nominalizations (Lapata, 2002;Padó et al., 2008;Kilicoglu et al., 2010). SRL can be performed with a classifier, trained on features such as syntactic structure and corpus frequencies of verb-argument pairs (Lapata, 2002;Pradhan et al., 2004). Compared to SRL, paraphrasing a nominalization requires the generation of the verb and its arguments. Some downstream NLP tasks, such as semantic parsing into meaning representations (Samuel and Straka, 2020), can directly make use of SRL output. To others, such as machine translation and text simplification, a paraphrase of the nominalization can potentially be more immediately applicable.
Previous studies have only addressed nominalizations with one argument, typically a nominal modifier (e.g., "fracture treatment"). However, adjective modifiers ("surgical treatment") are also prevalent and have been analyzed by linguists along with nominal modifiers under the term "complex nominal" (Levy, 1978). While there has been previous work on predicting the attribute of an adjective (Hartung et al., 2017)   in question answering (Greenwood, 2004), neither has been evaluated on paraphrase generation in the general domain.

Dataset
We collected 369 sentences from English Wikipedia that match the input pattern defined in Section 2, i.e., a deverbal noun (N V ) modified by the preceding noun (M n ) or adjective (M a ), and by a prepositional phrase (PP). The M a must have a pertainym in WordNet; N V in the Academic Word List (Coxhead, 2016) with high frequency are given priority, to reflect the widespread use of nominalizations in academic text. Two annotators, a native speaker and a nearnative speaker of English, composed a paraphrase for each nominalization. A professor of linguistics who is a native speaker of English reviewed the paraphrases, either keeping both or selecting one of them. As shown in Table 2, the final dataset contains a total of 449 paraphrases. 3 The annotators were instructed to use the simple present active for V , since tense and aspect cannot always be inferred from the rest of the sentence. They were also asked, as far as possible, to paraphrase M a with an etymologically related word, and to preserve the lemma of M n and O. If necessary, an additional word may be inserted for clarity or fluency of the paraphrase. 4

Approach
The system takes as input the original sentence and the strings N V , p, O, and M a or M n , (Table 1). These strings were provided, rather than automatically extracted, so that errors in automatic parsing would not confound experimental results.

Paraphrase generation
We first generate all possible word choices for the verb and its arguments that will appear in candidate paraphrases: Verb (V ) Candidates include all verbs that are derivationally related to N V in the WordNet database. If there is none, we use all verbs in the entry of N V in CatVar (Habash and Dorr, 2003).

Arguments (M , O)
The argument M is constructed from either M n or M a . For the former, candidates include the singular and plural forms of M n . For the latter, candidates include the singular and plural forms of its pertainyms in WordNet. The argument O remains unchanged from the input.
We place all permutations of candidates for V , M , and O in the word orders of all paraphrase types shown in Table 1, as well as their passive voice equivalents. 5 The arguments M and O can be preceded with a determiner, and also with a preposition except when at the sentence-initial position. We used the T5 model (Raffel et al., 2020) to generate up to two words -a preposition followed by a determiner; a preposition alone; a determiner alone; or the empty string.

Paraphrase selection
We then select the best candidate paraphrase with the following approaches: Sentence similarity selects the candidate paraphrase whose sentence embedding is most similar to that of the original nominalization, according to cosine similarity. We obtained sentence embeddings with the pre-trained stsb-roberta-large model 6 from Sentence-BERT (Reimers and Gurevych, 2019).
Language model (LM) selects the candidate paraphrase that yields the highest language model score. We evaluated the log-probability score based on GPT-2 (117M), and the pseudo-log-likelihood score based on DistilBERT (Salazar et al., 2020). 7 5 The passive paraphrases are "O Vprt by M " (for MVO), "M Vprt by O" (for OVM), "O Vprt p M " (for VOM) and "M Vprt p O" (for VMO), where Vprt represents be followed by the past participle of V , and p represents any preposition except "by". 6 https://huggingface.co/sentence-transformers/stsbroberta-large 7 Both from https://github.com/awslabs/mlm-scoring Majority + LM selects the candidate paraphrase from the majority paraphrase type -i.e., MVO for an M a input and VOM for an M n input (Table 2) -that yields the highest LM score.
Semantic parser + LM first obtains the Abstract Meaning Representation of the input sentence with PERIN (Samuel and Straka, 2020) and predicts the paraphrase type from the arg0 and arg1 values of the node aligned to the deverbal noun N V . It predicts MVO if arg0 is aligned to M a or M n , and OVM if arg0 is aligned to O. Otherwise, it predicts VMO if arg1 is M a or M n , and VOM if arg1 is O. It then selects the candidate paraphrase from the predicted type that yields the highest LM score.
Textual Entailment takes the original sentence as the premise, and the candidate paraphrase as the hypothesis, and predicts whether the facts in the former imply those in the latter. We used the textual entailment model from AllenNLP that is fine-tuned on SNLI with RoBERTa (Liu et al., 2019). 8 The candidate paraphrase with the highest score is selected.
Textual Entailment + LM selects the candidate paraphrase with the highest language model score among the three highest-scoring candidate paraphrases identified with the Textual Entailment model.

Evaluation metrics
In a clausal paraphrase, the determiner and number of the noun (M ) that corresponds to the prenominal modifier (M a or M n ) can be ambiguous and open to different interpretations. In our evaluation, we removed all determiners and lemmatized all words in both the gold and the predicted paraphrase, and then compared them on two metrics:

Paraphrase accuracy
The system is considered correct if the lemmatized form of the predicted paraphrase and gold paraphrase are identical.
Word order accuracy Same as above, except that prepositions are not taken into consideration. This metric essentially measures the system's ability to predict the verb and arguments and to put them into the correct word order. 8 https://demo.allennlp.org/textual-entailment/roberta-snli Further, we consider two experimental settings:

Gold arguments
The system has access to the gold V , M and O. In other words, it needs only to determine the paraphrase type and prepositions.
Automatic The system automatically generates V , M and O from the input.

Results
Paraphrase accuracy. As shown in Table 3, the Textual Entailment + LM model, in conjunction with GPT-2, outperformed all other models with respect to this metric in both settings. When given the gold arguments, it achieved 51.22% accuracy, a statistically significant improvement over using Textual Entailment alone (38.21%) and DistilBERT alone (43.36%). 9 As expected, its performance degraded in the fully automatic setting (38.48%), but it continued to outperform all other models with statistical significance, including Textual Entailment and DistilBert. 10 These results suggest that the language model and textual entailment model can complement each other. While the former optimizes the likelihood of a candidate paraphrase on its own, the former also takes into account word choices in the premise (i.e., the nominalization), including the preposition 11 , when estimating semantic equivalence.
Word order accuracy. The Textual Entailment + LM model gave the strongest performance in both settings, but with DistilBERT rather than with GPT-2 on this metric. In the "gold arguments" setting, its accuracy is highest at 65.58%, followed by DistilBERT (62.87%) and Textual Entailment (60.70%), although the improvement is not statistically significant. 12 In the automatic setting, it yields 51.76% accuracy, again outperforming both Textual Entailment and DistilBERT. 13 . Further, our model, which outperformed Semantic parser + LM in all settings, has potential to improve semantic role labeling. 9 At p = 3 × 10 −5 and p = 0.008, respectively, by McNemar's Test. 10 At p = 3 × 10 −4 and p = 0.02, respectively, by McNemar's Test. 11 e.g., the choice of preposition in "environmental effect of" vs. "environmental effect on" indicates whether "environment" should serve as subject or object in the paraphrase. 12 At p = 0.12 and p = 0.34, respectively, by McNemar's Test.
13 Statistically significant against Textual Entailment at p = 0.006, but not significant against DistilBERT at p = 0.33

Conclusion
We have presented a study on generating clausal paraphrases for compound nominalizations. Extending previous work, our dataset contains nominalizations with two arguments, namely a prepositional phrase argument as well as a nominal or adjectival modifier. We evaluated a number of unsupervised methods for paraphrase generation. Experimental results show that using a textual entailment model, followed by re-ranking with a language model score, yields the best performance. The proposed method can contribute to downstream NLP tasks that require natural language understanding of texts in which nominalizations are frequently found.