Exploring Classifier Combinations for Language Variety Identification

Tim Kreutz, Walter Daelemans


Abstract
This paper describes CLiPS’s submissions for the Discriminating between Dutch and Flemish in Subtitles (DFS) shared task at VarDial 2018. We explore different ways to combine classifiers trained on different feature groups. Our best system uses two Linear SVM classifiers; one trained on lexical features (word n-grams) and one trained on syntactic features (PoS n-grams). The final prediction for a document to be in Flemish Dutch or Netherlandic Dutch is made by the classifier that outputs the highest probability for one of the two labels. This confidence vote approach outperforms a meta-classifier on the development data and on the test data.
Anthology ID:
W18-3922
Volume:
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
191–198
Language:
URL:
https://aclanthology.org/W18-3922
DOI:
Bibkey:
Cite (ACL):
Tim Kreutz and Walter Daelemans. 2018. Exploring Classifier Combinations for Language Variety Identification. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pages 191–198, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Exploring Classifier Combinations for Language Variety Identification (Kreutz & Daelemans, VarDial 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3922.pdf