Sara Rodríguez-Fernández


2018

2016

WordNet is probably the best known lexical resource in Natural Language Processing. While it is widely regarded as a high quality repository of concepts and semantic relations, updating and extending it manually is costly. One important type of relation which could potentially add enormous value to WordNet is the inclusion of collocational information, which is paramount in tasks such as Machine Translation, Natural Language Generation and Second Language Learning. In this paper, we present ColWordNet (CWN), an extended WordNet version with fine-grained collocational information, automatically introduced thanks to a method exploiting linear relations between analogous sense-level embeddings spaces. We perform both intrinsic and extrinsic evaluations, and release CWN for the use and scrutiny of the community.
Collocations such as “heavy rain” or “make [a] decision”, are combinations of two elements where one (the base) is freely chosen, while the choice of the other (collocate) is restricted, depending on the base. Collocations present difficulties even to advanced language learners, who usually struggle to find the right collocate to express a particular meaning, e.g., both “heavy” and “strong” express the meaning ‘intense’, but while “rain” selects “heavy”, “wind” selects “strong”. Lexical Functions (LFs) describe the meanings that hold between the elements of collocations, such as ‘intense’, ‘perform’, ‘create’, ‘increase’, etc. Language resources with semantically classified collocations would be of great help for students, however they are expensive to build, since they are manually constructed, and scarce. We present an unsupervised approach to the acquisition and semantic classification of collocations according to LFs, based on word embeddings in which, given an example of a collocation for each of the target LFs and a set of bases, the system retrieves a list of collocates for each base and LF.

2015