Comparing Two Basic Methods for Discriminating Between Similar Languages and Varieties

Pablo Gamallo, Iñaki Alegria, José Ramom Pichel, Manex Agirrezabal


Abstract
This article describes the systems submitted by the Citius_Ixa_Imaxin team to the Discriminating Similar Languages Shared Task 2016. The systems are based on two different strategies: classification with ranked dictionaries and Naive Bayes classifiers. The results of the evaluation show that ranking dictionaries are more sound and stable across different domains while basic bayesian models perform reasonably well on in-domain datasets, but their performance drops when they are applied on out-of-domain texts.
Anthology ID:
W16-4822
Volume:
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi
Venue:
VarDial
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
170–177
Language:
URL:
https://aclanthology.org/W16-4822
DOI:
Bibkey:
Cite (ACL):
Pablo Gamallo, Iñaki Alegria, José Ramom Pichel, and Manex Agirrezabal. 2016. Comparing Two Basic Methods for Discriminating Between Similar Languages and Varieties. In Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pages 170–177, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Comparing Two Basic Methods for Discriminating Between Similar Languages and Varieties (Gamallo et al., VarDial 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4822.pdf