Discriminating Similar Languages with Linear SVMs and Neural Networks

Çağrı Çöltekin, Taraka Rama


Abstract
This paper describes the systems we experimented with for participating in the discriminating between similar languages (DSL) shared task 2016. We submitted results of a single system based on support vector machines (SVM) with linear kernel and using character ngram features, which obtained the first rank at the closed training track for test set A. Besides the linear SVM, we also report additional experiments with a number of deep learning architectures. Despite our intuition that non-linear deep learning methods should be advantageous, linear models seems to fare better in this task, at least with the amount of data and the amount of effort we spent on tuning these models.
Anthology ID:
W16-4802
Volume:
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi
Venue:
VarDial
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
15–24
Language:
URL:
https://aclanthology.org/W16-4802
DOI:
Bibkey:
Cite (ACL):
Çağrı Çöltekin and Taraka Rama. 2016. Discriminating Similar Languages with Linear SVMs and Neural Networks. In Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pages 15–24, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Discriminating Similar Languages with Linear SVMs and Neural Networks (Çöltekin & Rama, VarDial 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4802.pdf