FrameNet-assisted Noun Compound Interpretation

Given a noun compound (NC), we address the problem of predicting the appropriate semantic label linking the constituents of the NC. This problem is called Noun Compound Interpretation (NCI) . We use FrameNet as a semantic label repository. For example, given the noun compound ( board approval ), we predict the frame ( D ENY OR GRANT PERMISSION , as per FrameNet) as appropriate and the semantic role of the modiﬁer word ( A UTHORITY ) as the semantic label linking board and approval ; the resulting label is D ENY OR GRANT PERMISSION : A UTHORITY . Our semantic label repository is very large ( ≈ 11k labels) compared to the NC data available for training (approx 1900). Thus, learning in this case, especially for unseen semantic labels, is hard. We propose to solve this problem by predicting semantic labels in a continuous label embedding space , which is novel. This embedding space is created by learning label embeddings using the FrameNet data. The embeddings are then used to train two separate models – one for predicting Frames and the other for FEs. As the label embedding space captures the semantics of the labels, using these embeddings enables generalizing well on unseen labels, thus achieving zero-shot learning. Our preliminary investigations show that the proposed approach performs well for unseen labels, achieving 5% and 2% points improvements over baselines for the frame and FE prediction, respectively. The study shows the promise of the use of continuous space embeddings for noun compound interpretation and points to the need for further investigation.


Introduction
A noun compound is a sequence of two or more nouns that act as a single entity with well-defined meaning (e.g., paper submission, colon cancer, etc.). Semantic relations between the component nouns are implicit. For instance, the information that 'it is a juice made from orange' is hidden in orange juice. Uncovering this semantic relation is called the problem of Noun Compound Interpretation (NCI). NCI needs ML, as the task faces the challenge of ambiguity, and disambiguation by rules is well nigh impossible because of multifarious complex underlying language phenomena. The proposition of storing NCs and doing table lookup for interpretation is also impractical due to a large number of NCs and the challenge of high productivity (new nouns and NCs get created frequently, e.g., corona vaccine is a relatively new NC).
Often, the exact relation, sentiment, etc. are also governed by contextual pragmatics. For instance, the sentiment towards tax money depends on who the beneficiary is, which again depends on the predicate. The predicate give could indicate negative sentiment (for the tax-payer), whereas the predicate receive would indicate positive sentiment (for the government). Due to such instances, NLP tasks such as machine translation (Baldwin and Tanaka, 2004;Balyan and Chatterjee, 2015), textual entailment (Nakov, 2013), question answering (Ahn et al., 2005), etc. suffer when they encounter noun compounds. For example, from the below textquestion pairs, a system would need to interpret the underlying semantics within the compound, to answer the question correctly.
(a) "student protest": "who is protesting?", (b) "fee-hike protest": "why protest?", and (c) "university protest": "where is the protest?" In this work, we interpret only compositional noun-noun compounds. A noun-noun compound is categorised as compositional, if the meaning of the compound can be composed from the semantics of the individual noun units present.
From a relation representation perspective, noun compounds are interpreted in two ways: via labelling and paraphrasing. Labelling involves assigning an abstract semantic relation from a predefined set, for example, orange juice: MADEOF, hillside home: LOCATION, etc. There are many inventories of predefined semantic relations.
We use the FrametNet based labels proposed by Ponkiya et al. (2018a). As per their convention, the head noun of a compound invokes the frame, and the modifier noun fits in one of the frame elements of the invoked frame, vide 'board approval' in the abstract.
There are more than 11,000 FEs in FrameNet, and we have about 1900 training examples. Thus, the average number of examples for each label is quite small, and many labels do not have a training example. In summary, the contributions of this paper are three-fold: 1. We embed FrameNet entities in a continuous space, perform prediction in the continuous space to generalize over unseen labels, and show performance improvement on the unseen labels.
2. We create a noun-compound annotation tool that assists annotators in providing manual labels, and we release it publicly.
3. Using the above tool, we extend the dataset released by Ponkiya et al. (2018a) with 326 more manually-annotated gold samples, and release it for further research.
The rest of the paper is organized as follow: Section 2 discuss related work, Section 3 gives an overview of foundations for the work. Section 4 details our approach. Section 5 provides experimental details: the dataset used and training/testing setup. Section 6 discusses the results and analysis, followed by a conclusion and future work. The code, dataset and the tool can be downloaded from http://www.cfilt.iitb.ac.in/nc-dataset.

Related Work
A relation between the components of a noun compound (say, chocolate cake) can be represented in one of the following two ways: (1) assigning a relation from a predefined set of semantic relations (MADEOF), or (2) using a paraphrase to convey the underlying semantic relation ("cake made using chocolates" or "cake with chocolate flavor").
Noun-compound (NC) interpretation via labelling is the most commonly used methodology for NC interpretation. Scholars have proposed many inventories of semantic relations (Levi, 1978;Warren, 1978;Vanderwende, 1994;Lauer, 1995;Barker and Szpakowicz, 1998;Ó Séaghdha, 2007;Rosario et al., 2001;Tratz and Hovy, 2010;Fares, 2016;Ponkiya et al., 2018a). A recent FrameNetbased inventory by Ponkiya et al. (2018a) proposed FEs (Frame Elements) from FrameNet as labels (or, semantic relations). They released a dataset by annotating each noun compound with a frame and a frame element; and proposed this annotation for predicate 'nominalization'. However, it also works for most of the cases of 'predicate deletion'.
For automatic labelling, Dima and Hinrichs (2015) and Fares et al. (2018)'s architecture is similar to ours. Dima and Hinrichs (2015) proposed a feed-forward neural network-based approach. This network takes concatenated embeddings of component nouns as an input and predicts one of the labels from the Tratz and Hovy (2010)'s label set. Fares et al. (2018) used a similar feed-forward network to predict two types of relations. This network, however, shares initial layers and separates output layers for each label type.
NC interpretation via paraphrasing is another methodology that contains approaches such as prepositional and free paraphrasing. Prepositional paraphrasing, i.e., paraphrasing using a preposition, for example, student protest: "protest by student(s)", is a relatively well-attended problem (Lauer, 1995;Lapata and Keller, 2004;Ponkiya et al., 2018b). All the above approaches proposed for prepositional paraphrasing use the fixed-set of eight prepositions proposed by Lauer (1995). The other set of approaches, i.e., free paraphrasing, however, has not received much attention. Apart from two SemEval tasks (Butnariu et al., 2009;Hendrickx et al., 2013), it does not have much literature available. A recent study (Ponkiya et al., 2020) expresses paraphrasing as a "fill-in-theblank" problem, and utilizes pre-trained language models, for the task of noun-compound interpretation.

Foundations
Levi (1978) performed a linguistic study to understand how noun compounds are generated. They call such compounds nominal compounds. This theory puts nominal compounds into two categories, based on the compounding process, as 1. Predicate Deletion: Here, a predicate between the components is dropped to create a compound.
For example, apple pie is a "pie made from apple." The predicate made from is dropped in this case. Similarly, for elbow injury, gas pipeline, etc.

Predicate Nominalization:
Here, the head noun is a nominalized form of a verb, and the modifier is an argument of the verb.
For example, "The union demonstrated against the price hike. . . " becomes "The union demonstration against the price hike. . . " Verbal noun as head: student demonstration, government approval, opposition objection, etc. Verb form as head: student protest, government support, competition schedule, etc.
Levi (1978) also proposed a set of abstract predicates 1 for the former category, but no relation for the latter category. Later,Ó Séaghdha (2007) revised this inventory and proposed a two-level hierarchy of semantic relations. Ponkiya et al. (2018a) proposed a method to use FrameNet based labels for noun compounds. Here, the head noun invokes a frame, and the modifier noun fits in one of the slots of the frame. They also prepared a dataset by annotating each noun compound with a frame and a frame element. Ponkiya et al. (2018a) proposed this annotation for predicate nominalization, which also works for most cases of predicate deletion.

FrameNet
FrameNet 2 (Baker et al., 1998) is a taxonomy based on Fillmore's theory of Frame Semantics. This theory claims that most words' meanings can be inferred based on a semantic frame: a conceptual structure that denotes an abstract event, relation, or entity and the involved participants. For example, the concept of questioning involves a person asking a question (SPEAKER), person/people begin questioned ADDRESSEE, the content of the question MESSAGE, and so on. In FrameNet, such a concept is represented by QUESTIONING frame. The participating entities, such as SPEAKER, ADDRESSEE, MESSAGE, etc., are called frame elements (FEs). Such frames are invoked in running text via words known as lexical units. Some of the lexical units for the QUESTIONING frame are ask, grill, inquire, inquiry, interrogate, query, etc. FrameNet data provides two types of linkages between entities: (a) relations: linking among frames or among FEs, and (b) mappings: linking from words to frames and from frames to FEs.

Relations
FrameNet includes a graph of relations between frames along with relations among frames. Some of the important frame relations are: • Inheritance: close to a typical Is-A relation, e.g., In our work, we utilize Relations for the generation of frame and frame element embeddings ( §3.2). We, further, utilize the Mappings to prune the search space ( §4.1).

Knowledge Graph Embeddings
A Knowledge Graph G is a set of relations R defined over a set of entities E. Formally, it is comprised of a set of N triples (h, r, t), where h and t are called head and tail entities, and r denotes a relation among them.
Knowledge Graphs are widely used to store knowledge in a structured format, and they play an important role in representation learning. Methods for learning representations for both entities E and relations R have been explored (Wang et al., 2017) with an aim to represent graphical knowledge.
Various algorithms for representation learning have been proposed, which help tasks such as link prediction etc. TransE (Bordes et al., 2013) is a method that models relationships by interpreting them as translations operating on the lowdimensional embeddings of the entities. We use the ConvE (Dettmers et al., 2018) algorithm to get embeddings of frames and frame elements. For the training of ConvE, we treat all relations from FrameNet as triples of a knowledge graph.

ConvE
Convolution-based Embeddings is a multi-layer 2D-convolution network model proposed by Dettmers et al. (2018). It usages fewer parameters, yet efficient compared to similar models. It defines the scoring function (for each relation r) as follows: where, e h , e r and e t are embeddings of head h, relation r and tail t, respectively, x denotes reshaping of vector x to a matrix, f is a rectified linear unit (relu) function, vec converts a matrix into a flat vector, w is convolution kernel, and W is the parameter of a fully connected layer. For training, it applies logistic sigmoid function σ(·) to the scores, and minimize the binary crossentropy computed using the following formula: where, p = σ(ψ r (e h , e t )) and t is 1 when (h, r, t) ∈ G, 0 otherwise. ConvE uses two embedding layers: one for entities and the other for relations, which initializes the embeddings layers randomly. The embeddings layers get updated during the training. At the end of the training, the embedding layers contain the embeddings for entities and relations.

Our Approach
FrameNet has 1223 frames and 11,473 frame elements. However, the existing dataset for FrameNetbased noun compound interpretation does not have examples for many frames and frame elements. However, unlike other relation inventories, we have FrameNet taxonomy, which can help in building a better model. We first explain our frame prediction approach and then extend the same for FE prediction.

System Architecture
We encode a given noun compound nc = w 1 w 2 (say, divorce rate) using a feed-forward network to get vector v nc . Using FrameNet API, we create a set of candidate frames that can be invoked by w 2 (rate → {ASSESSING, PROPORTION, SPEED DESCRIPTION, etc.}). For each candidate frame f i , we take its frame embedding e f i from the frame embedding layer. We take the dot product of v nc with embedding e f i of each frame f i to compute the score for the frame. For testing, we use the following formula to predict a frame: Figure 1: Basic system architecture illustrating frame prediction for divorce rate. (2015) and Fares et al. (2018) use a simple feed-forward network. In our model, if we remove the frame/FE embedding matrix and use them as weights of one more dense layer (after the "fully-connected layers"), our model becomes identical to theirs. In doing so, (a) our model will NOT need any extra computation to compute the score of labels that are NOT part of the candidate set, and (b) the back-prorogation does not have to pass through an additional layer which might not be effective.

Dima and Hinrichs
We implement these models in PyTorch (Paszke et al., 2017). We initialize the word embedding layer with Google's pre-trained embeddings 3 and initialize the frame embedding layer with random values, in one case, for baseline, and pre-trained frame embedding, in another case.
We use the same architecture to train another model for FE prediction, replace the frame embedding layer with an FE embedding layer, and candidates FEs are the FEs from all candidate frames. We take all FEs as a candidate set if no such mapping is found.

Frame and Frame Element Embeddings
Inspired by Kumar et al. (2019)'s approach for the task of Word Sense Disambiguation (WSD), we propose a similar approach to perform NC interpretation. Our approach uses the definition of entities (along with the relations) to learn entity embeddings and relation embeddings. It uses an encoder (Bi-LSTM) to encode the definition of an entity and uses encoded representation as an embedding of the entity for ConvE. During the training, it also optimizes both: the encoder and ConvE. After the training, the encoding of definitions is taken as entity embeddings.
We train ConvE twice to get frame and frame element embeddings separately. ConvE training is independent of the main training.

Experimental Setup
In this section, we explain our dataset, baseline, training, and evaluation metrics.

Dataset Creation and Analysis
We use the dataset released by Ponkiya et al. (2018a) as D1. The dataset contains 1546 nounnoun compounds with two labels: frame and FE. The dataset was created by extracting noun compound along with labels from the FrameNet data. As the extraction is automatic and the manual step only confirms the correctness of the labelling, the 3 https://code.google.com/archive/p/word2vec/ labels are not exhaustive. For instance, a noun compound student demonstration has been annotated with PROTEST:PROTESTER. However, the following labels are also applicable: REASONING:ARGUER and CAUSE TO PERCEIVE:ACTOR. So, we annotate more examples with all possible labels.

Manual Annotation
We manually annotate 326 noun compounds, and call it D2. We extend D1 by merging these examples from D2 to perform our experiments. The annotation is performed by one of the authors and hence does not warrant discussion on the interannotator agreement. However, please allow us to point out that our annotations are still manually performed by a human, which begets the consideration of these annotations to be gold-standard. The author chose the examples from Tratz and Hovy (2010)'s dataset randomly. During the annotation process, we found some difficulties because of the coverage issue of the FrameNet. The wordto-frame mapping in FrameNet has a coverage issue, and it has been widely reported in the literature (Pavlick et al., 2015;Botschen et al., 2017).
We categorize the coverage issues into the following: No Candidate Frames: The word-to-frame mapping returned no candidate frame. In some cases, we could find a frame with manual effort (ref . Table 1). However, despite manual efforts, some cases, we could not find an appropriate frame all the time (e.g., star autograph, employee misconduct, etc.  and CHANGE OF TEMPERATURE. However, none of the two frames is appropriate for body heat.
In some cases, we could find an appropriate frame that was not a part of the candidate set. For example, for noun compound ulcer drug, candidate frames are INTOXICANTS and CAUSE HARM, but the appropriate frame is CURE.
No Suitable Frame Element in the Frame: We could find an appropriate frame, but no frame element from the frame is appropriate. For instance, the BUSINESS frame is suitable for retail operation, but no frame element from the frame is suitable for the modifier noun retail.

NC Annotation Tool
To handle the first two cases (finding of a frame), we use synonyms from WordNet (Miller, 1994) and FrameNet+ data (Pavlick et al., 2015). To simplify the annotation process, we develop a tool ( Figure  2) that makes the annotation process easier. We split each dataset -D1 and D1+D2 -randomly for 5-fold validation. Each fold contains three disjoint sets: training set (60% compounds), validation set (20% compounds), and test set (20% compounds). We use the same folds across all ex-periments, so results across different models are comparable.

Frame and Frame Element Embeddings
To get frame embeddings, we consider frames as entities and frame relations from FrameNet as relations between the entities. Then we train ConvE ( §4.2) to learn frame embeddings. We use these entity embeddings to initialize the frame embedding layer. Table 2 shows the ten most similar frames for FRIENDLY OR HOSTILE frame based on cosine similarity between frame embeddings. Similarly, we get embeddings of frame elements using frame elements and their relations in FrameNet.

Baseline
The first baseline is a random prediction: the probability of predicting a label from a candidate set is uniform. We take expected counts to compute metrics. For instance, we compute random accuracy using the following formula: We also use Support Vector Machines (SVM) (Cortes and Vapnik, 1995) as a baseline approach for this task. We use the sklearn library (Pedregosa et al., 2011)   this approach. The input for the SVM-based approach is the concatenated vector of individual lexical units.
We provide results for another baseline approach where we use the same architecture ( §4.1) with random initialization for frame/FE embeddings.

Training
Given a noun compound, we get candidate labels using FrameNet mapping. We compute scores for candidate labels and compare them with the target to compute loss value. We minimize categorical cross-entropy with stochastic gradient descent (with momentum). Frame/FE embedding layer remains fixed (non-trainable) for the initial few epochs. For stopping criteria, we monitor performance on the validation set.

Evaluation
We report weighted Precision, Recall, and F1score for our experiments. The weight values for each label is in proportion to the number of test examples for the label. Following is a formula for computing (weighted) precision: where, P l is the precision score, TP l is the number of true-positives and FP l is the number of falsepositives for a label l. N l is the number of instances with label l in the test set, and N is the total number of instances in a test set.
The above metrics are based on the top prediction. We also report accuracy at k, which treats a prediction as a true prediction if the correct label is in top k predicted labels. We report (microaveraged) accuracy, computed using the following formula:

Acc =
No. of correctly classified instances Total instances in test-test (7) We compute all of these metrics for frame and frame element independently. We use the Scikitlearn library (Pedregosa et al., 2011) to compute all of these metrics.

Results and Analysis
The reported results are averaged across 5-fold cross-validation. We define a subset of all test samples, whose output label does not have any samples in the training set, as unseen-set. In a fold (of D1) with 310 test samples, the following are the statistics about the "unseen set": (1) 30 unique frames, covering 32 test samples, (2) 75 unique FEs, covering 82 test samples. These cases are challenging to handle. Our prediction in continuous space helps in such cases, as the target space embeds the labels. Table 3 reports the performance of the baselines compared to our system for frame prediction on the entire test-set. Our system beats the random baseline and SVM model across all metrics by a significant margin. Similarly, as observed from Table 4, it can be seen that frame prediction on the "unseen" set improvises over the random baseline in both the cases, viz., WITH and WITHOUT frame embeddings. The performance improvement is quite significant over both the datasets (D1 and   D1+D2). We attribute the performance improvement in both cases to the fact that our model captures frame semantics for a frame in a continuous space, better than the baseline metrics. However, as seen in Table 3, the model which tries to predict WITHOUT frame embeddings shows comparable results with the model which predicts WITH frame embeddings. This marginal improvement does not seem to be significant, and the results for both these cases are almost similar. Eventually, with the improvement in dataset size, we expect the system with frame embeddings to perform better than the system without frame embeddings.
We discuss the task of frame prediction above and present our results for frame element prediction here. As it can be observed from Table 5, random baseline and SVM based models are outperformed for the task of frame element prediction as well, when compared with the results of our methodology. The improvements over both datasets (D1 and D1+D2) are at least 5% (D1+D2/SVM). With the extended dataset (D1 + D2), the performance of our approach shows an improvement across all three (precision, recall and f-score) measures. In Table  3, for frame prediction, we observe an increase in the stronger baseline score (SVM) when the extended dataset is used. However, we observe that our results for frame element prediction show that the model which uses frame embeddings is significantly outperformed by the model which does not use frame embeddings. Upon manual analysis of our train set, we find that our datasets have multiple cases where the number of examples per frame element is very few (sometimes even 1, as discussed below). This results in a data skew where the sample would either be used for training or testing, thus rendering the model either untrained for that test case, or no testing of the model trained for that single frame element. In Table 6, we see that the random baseline outperforms our method because of the data skew discussed here. There are multiple frame elements that are present in the unseen test data for which the model has not been trained at all. We do not report SVM performance in Table 6 and Table 4, since the precision, recall, and f-score for SVM were all 0. This performance can be attributed to the fact that SVM does not perform well with unseen examples, and in this case, does not perform at all.
For frame element prediction, in Table 5, we observe that the extended dataset helps improve the over quality of predictions with an improved score for each approach, including the baseline. These results signify that the dataset extension does indeed help the task of NC interpretation.
In Figure 3, we see that the average number of candidate FEs for a test sample is only 26. Without even a single training example, our system is able to correctly predict 51.22% (more than half) of unseen samples among the top-7 predictions, which is higher than the top-1 accuracy of any baseline approach on the complete test-set. With an increase    in k, the margin between the performance of our system and baseline remains nearly the same on the whole test set. However, on unseen labels, the system with FE embeddings significantly outperforms the baseline, with an increase in k.
Overall, we observe a significant improvement in results with the help of our method. We also show that our extended dataset does help improve the performance of models in both cases.

Conclusion and Future Work
In this paper, we proposed a novel method for using FrameNet for NC interpretation in a continuous space. We use FrameNet mappings (word to frame and frame to frame element) to prune our search space. Our approach -prediction in continuous space -outperforms the random baseline and a stronger baseline approach. We show that the label embeddings generated using our approach help in the generalisation over unseen labels.
We annotated more noun compounds and analysed the issue in finding frame and frame elements. We create and release a tool that assists annotators in frame identification, for further research. We also show that extending the dataset created with our tool improves the system performance. Our experiments evaluate our proposed method on a small annotated dataset compared to the overall number of labels. We extend this existing dataset by annotating more NCs for various labels. Our study on the coverage issue for the annotation process helps develop a tool that assists the annotators in finding an appropriate frame. We provide promising results for the task of frame prediction. We analyse our results and discuss them in detail with respect to both frame and frame element prediction tasks.
In the future, we aim to find other ways of using FrameNet data for this task. We would also like to investigate why our approach provides promising results for the task of frame prediction but not for frame element prediction. We would like to explore more approaches to predict the frame elements effectively. We believe that using FrameNet embeddings can prove to be helpful for other tasks.