Measuring the compositionality of NV expressions in Basque by means of distributional similarity techniques

Antton Gurrutxaga, Iñaki Alegria


Abstract
We present several experiments aiming at measuring the semantic compositionality of NV expressions in Basque. Our approach is based on the hypothesis that compositionality can be related to distributional similarity. The contexts of each NV expression are compared with the contexts of its corresponding components, by means of different techniques, as similarity measures usually used with the Vector Space Model (VSM), Latent Semantic Analysis (LSA) and some measures implemented in the Lemur Toolkit, as Indri index, tf-idf, Okapi index and Kullback-Leibler divergence. Using our previous work with cooccurrence techniques as a baseline, the results point to improvements using the Indri index or Kullback-Leibler divergence, and a slight further improvement when used in combination with cooccurrence measures such as $t$-score, via rank-aggregation. This work is part of a project for MWE extraction and characterization using different techniques aiming at measuring the properties related to idiomaticity, as institutionalization, non-compositionality and lexico-syntactic fixedness.
Anthology ID:
L12-1283
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2389–2394
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/514_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Antton Gurrutxaga and Iñaki Alegria. 2012. Measuring the compositionality of NV expressions in Basque by means of distributional similarity techniques. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2389–2394, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Measuring the compositionality of NV expressions in Basque by means of distributional similarity techniques (Gurrutxaga & Alegria, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/514_Paper.pdf