Predicting Compositionality of Verbal Multiword Expressions in Persian

Mahtab Sarlak, Yalda Yarandi, Mehrnoush Shamsfard


Abstract
The identification of Verbal Multiword Expressions (VMWEs) presents a greater challenge compared to non-verbal MWEs due to their higher surface variability. VMWEs are linguistic units that exhibit varying levels of semantic opaqueness and pose difficulties for computational models in terms of both their identification and the degree of compositionality. In this study, a new approach to predicting the compositional nature of VMWEs in Persian is presented. The method begins with an automatic identification of VMWEs in Persian sentences, which is approached as a sequence labeling problem for recognizing the components of VMWEs. The method then creates word embeddings that better capture the semantic properties of VMWEs and uses them to determine the degree of compositionality through multiple criteria. The study compares two neural architectures for identification, BiLSTM and ParsBERT, and shows that a fine-tuned BERT model surpasses the BiLSTM model in evaluation metrics with an F1 score of 89%. Next, a word2vec embedding model is trained to capture the semantics of identified VMWEs and is used to estimate their compositionality, resulting in an accuracy of 70.9% as demonstrated by experiments on a collected dataset of expert-annotated compositional and non-compositional VMWEs.
Anthology ID:
2023.mwe-1.5
Volume:
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Archna Bhatia, Kilian Evang, Marcos Garcia, Voula Giouli, Lifeng Han, Shiva Taslimipoor
Venue:
MWE
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
14–23
Language:
URL:
https://aclanthology.org/2023.mwe-1.5
DOI:
10.18653/v1/2023.mwe-1.5
Bibkey:
Cite (ACL):
Mahtab Sarlak, Yalda Yarandi, and Mehrnoush Shamsfard. 2023. Predicting Compositionality of Verbal Multiword Expressions in Persian. In Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023), pages 14–23, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Predicting Compositionality of Verbal Multiword Expressions in Persian (Sarlak et al., MWE 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.mwe-1.5.pdf
Video:
 https://aclanthology.org/2023.mwe-1.5.mp4