HeLI-based Experiments in Discriminating Between Dutch and Flemish Subtitles

Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén


Abstract
This paper presents the experiments and results obtained by the SUKI team in the Discriminating between Dutch and Flemish in Subtitles shared task of the VarDial 2018 Evaluation Campaign. Our best submission was ranked 8th, obtaining macro F1-score of 0.61. Our best results were produced by a language identifier implementing the HeLI method without any modifications. We describe, in addition to the best method we used, some of the experiments we did with unsupervised clustering.
Anthology ID:
W18-3915
Volume:
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
137–144
Language:
URL:
https://aclanthology.org/W18-3915
DOI:
Bibkey:
Cite (ACL):
Tommi Jauhiainen, Heidi Jauhiainen, and Krister Lindén. 2018. HeLI-based Experiments in Discriminating Between Dutch and Flemish Subtitles. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pages 137–144, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
HeLI-based Experiments in Discriminating Between Dutch and Flemish Subtitles (Jauhiainen et al., VarDial 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3915.pdf