Fine-tuning Distributional Semantic Models for Closely-Related Languages

Kushagra Bhatia; Divyanshu Aggarwal; Ashwini Vaidya

Fine-tuning Distributional Semantic Models for Closely-Related Languages

Kushagra Bhatia, Divyanshu Aggarwal, Ashwini Vaidya

Abstract

In this paper we compare the performance of three models: SGNS (skip-gram negative sampling) and augmented versions of SVD (singular value decomposition) and PPMI (Positive Pointwise Mutual Information) on a word similarity task. We particularly focus on the role of hyperparameter tuning for Hindi based on recommendations made in previous work (on English). Our results show that there are language specific preferences for these hyperparameters. We extend the best settings for Hindi to a set of related languages: Punjabi, Gujarati and Marathi with favourable results. We also find that a suitably tuned SVD model outperforms SGNS for most of our languages and is also more robust in a low-resource setting.

Anthology ID:: 2021.vardial-1.7
Volume:: Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:: April
Year:: 2021
Address:: Kiyv, Ukraine
Editors:: Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Yves Scherrer, Tommi Jauhiainen
Venue:: VarDial
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 60–66
Language:
URL:: https://aclanthology.org/2021.vardial-1.7/
DOI:
Bibkey:
Cite (ACL):: Kushagra Bhatia, Divyanshu Aggarwal, and Ashwini Vaidya. 2021. Fine-tuning Distributional Semantic Models for Closely-Related Languages. In Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 60–66, Kiyv, Ukraine. Association for Computational Linguistics.
Cite (Informal):: Fine-tuning Distributional Semantic Models for Closely-Related Languages (Bhatia et al., VarDial 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.vardial-1.7.pdf
Optionalsupplementarymaterial:: 2021.vardial-1.7.OptionalSupplementaryMaterial.zip

PDF Cite Search Optionalsupplementarymaterial Fix data