Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space

Filip Klubička, Vasudevan Nedumpozhimana, John Kelleher


Abstract
The goal of this paper is to learn more about how idiomatic information is structurally encoded in embeddings, using a structural probing method. We repurpose an existing English verbal multi-word expression (MWE) dataset to suit the probing framework and perform a comparative probing study of static (GloVe) and contextual (BERT) embeddings. Our experiments indicate that both encode some idiomatic information to varying degrees, but yield conflicting evidence as to whether idiomaticity is encoded in the vector norm, leaving this an open question. We also identify some limitations of the used dataset and highlight important directions for future work in improving its suitability for a probing analysis.
Anthology ID:
2023.mwe-1.8
Volume:
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Archna Bhatia, Kilian Evang, Marcos Garcia, Voula Giouli, Lifeng Han, Shiva Taslimipoor
Venue:
MWE
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
45–57
Language:
URL:
https://aclanthology.org/2023.mwe-1.8
DOI:
10.18653/v1/2023.mwe-1.8
Bibkey:
Cite (ACL):
Filip Klubička, Vasudevan Nedumpozhimana, and John Kelleher. 2023. Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space. In Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023), pages 45–57, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space (Klubička et al., MWE 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.mwe-1.8.pdf
Video:
 https://aclanthology.org/2023.mwe-1.8.mp4