Neural Lemmatization of Multiword Expressions

Marine Schmitt, Mathieu Constant


Abstract
This article focuses on the lemmatization of multiword expressions (MWEs). We propose a deep encoder-decoder architecture generating for every MWE word its corresponding part in the lemma, based on the internal context of the MWE. The encoder relies on recurrent networks based on (1) the character sequence of the individual words to capture their morphological properties, and (2) the word sequence of the MWE to capture lexical and syntactic properties. The decoder in charge of generating the corresponding part of the lemma for each word of the MWE is based on a classical character-level attention-based recurrent model. Our model is evaluated for Italian, French, Polish and Portuguese and shows good performances except for Polish.
Anthology ID:
W19-5117
Volume:
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)
Month:
August
Year:
2019
Address:
Florence, Italy
Venue:
MWE
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
142–148
Language:
URL:
https://aclanthology.org/W19-5117
DOI:
10.18653/v1/W19-5117
Bibkey:
Cite (ACL):
Marine Schmitt and Mathieu Constant. 2019. Neural Lemmatization of Multiword Expressions. In Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), pages 142–148, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Neural Lemmatization of Multiword Expressions (Schmitt & Constant, MWE 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5117.pdf