Combining Linguistic Features for the Detection of Croatian Multiword Expressions

Maja Buljan, Jan Šnajder


Abstract
As multiword expressions (MWEs) exhibit a range of idiosyncrasies, their automatic detection warrants the use of many different features. Tsvetkov and Wintner (2014) proposed a Bayesian network model that combines linguistically motivated features and also models their interactions. In this paper, we extend their model with new features and apply it to Croatian, a morphologically complex and a relatively free word order language, achieving a satisfactory performance of 0.823 F1-score. Furthermore, by comparing against (semi)naive Bayes models, we demonstrate that manually modeling feature interactions is indeed important. We make our annotated dataset of Croatian MWEs freely available.
Anthology ID:
W17-1727
Volume:
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Month:
April
Year:
2017
Address:
Valencia, Spain
Venue:
MWE
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
194–199
Language:
URL:
https://aclanthology.org/W17-1727
DOI:
10.18653/v1/W17-1727
Bibkey:
Cite (ACL):
Maja Buljan and Jan Šnajder. 2017. Combining Linguistic Features for the Detection of Croatian Multiword Expressions. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pages 194–199, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Combining Linguistic Features for the Detection of Croatian Multiword Expressions (Buljan & Šnajder, MWE 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-1727.pdf