The Impact of Word Representations on Sequential Neural MWE Identification

Nicolas Zampieri, Carlos Ramisch, Geraldine Damnati


Abstract
Recent initiatives such as the PARSEME shared task allowed the rapid development of MWE identification systems. Many of those are based on recent NLP advances, using neural sequence models that take continuous word representations as input. We study two related questions in neural MWE identification: (a) the use of lemmas and/or surface forms as input features, and (b) the use of word-based or character-based embeddings to represent them. Our experiments on Basque, French, and Polish show that character-based representations yield systematically better results than word-based ones. In some cases, character-based representations of surface forms can be used as a proxy for lemmas, depending on the morphological complexity of the language.
Anthology ID:
W19-5121
Volume:
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)
Month:
August
Year:
2019
Address:
Florence, Italy
Venue:
MWE
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
169–175
Language:
URL:
https://aclanthology.org/W19-5121
DOI:
10.18653/v1/W19-5121
Bibkey:
Cite (ACL):
Nicolas Zampieri, Carlos Ramisch, and Geraldine Damnati. 2019. The Impact of Word Representations on Sequential Neural MWE Identification. In Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), pages 169–175, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
The Impact of Word Representations on Sequential Neural MWE Identification (Zampieri et al., MWE 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5121.pdf