Multi-word Lexical Units Recognition in WordNet

Marek Maziarz, Ewa Rudnicka, Łukasz Grabowski


Abstract
WordNet is a state-of-the-art lexical resource used in many tasks in Natural Language Processing, also in multi-word expression (MWE) recognition. However, not all MWEs recorded in WordNet could be indisputably called lexicalised. Some of them are semantically compositional and show no signs of idiosyncrasy. This state of affairs affects all evaluation measures that use the list of all WordNet MWEs as a gold standard. We propose a method of distinguishing between lexicalised and non-lexicalised word combinations in WordNet, taking into account lexicality features, such as semantic compositionality, MWE length and translational criterion. Both a rule-based approach and a ridge logistic regression are applied, beating a random baseline in precision of singling out lexicalised MWEs, as well as in recall of ruling out cases of non-lexicalised MWEs.
Anthology ID:
2022.mwe-1.8
Volume:
Proceedings of the 18th Workshop on Multiword Expressions @LREC2022
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
MWE
SIG:
SIGLEX
Publisher:
European Language Resources Association
Note:
Pages:
49–54
Language:
URL:
https://aclanthology.org/2022.mwe-1.8
DOI:
Bibkey:
Cite (ACL):
Marek Maziarz, Ewa Rudnicka, and Łukasz Grabowski. 2022. Multi-word Lexical Units Recognition in WordNet. In Proceedings of the 18th Workshop on Multiword Expressions @LREC2022, pages 49–54, Marseille, France. European Language Resources Association.
Cite (Informal):
Multi-word Lexical Units Recognition in WordNet (Maziarz et al., MWE 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.mwe-1.8.pdf