Token-level Identification of Multiword Expressions using Pre-trained Multilingual Language Models

Raghuraman Swaminathan, Paul Cook


Abstract
In this paper, we consider novel cross-lingual settings for multiword expression (MWE) identification (Ramisch et al., 2020) and idiomaticity prediction (Tayyar Madabushi et al., 2022) in which systems are tested on languages that are unseen during training. Our findings indicate that pre-trained multilingual language models are able to learn knowledge about MWEs and idiomaticity that is not languagespecific. Moreover, we find that training data from other languages can be leveraged to give improvements over monolingual models.
Anthology ID:
2023.mwe-1.1
Volume:
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Archna Bhatia, Kilian Evang, Marcos Garcia, Voula Giouli, Lifeng Han, Shiva Taslimipoor
Venue:
MWE
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–6
Language:
URL:
https://aclanthology.org/2023.mwe-1.1
DOI:
10.18653/v1/2023.mwe-1.1
Bibkey:
Cite (ACL):
Raghuraman Swaminathan and Paul Cook. 2023. Token-level Identification of Multiword Expressions using Pre-trained Multilingual Language Models. In Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023), pages 1–6, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Token-level Identification of Multiword Expressions using Pre-trained Multilingual Language Models (Swaminathan & Cook, MWE 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.mwe-1.1.pdf
Video:
 https://aclanthology.org/2023.mwe-1.1.mp4