Challenges in Neural Language Identification: NRC at VarDial 2020

Gabriel Bernier-Colborne, Cyril Goutte


Abstract
We describe the systems developed by the National Research Council Canada for the Uralic language identification shared task at the 2020 VarDial evaluation campaign. Although our official results were well below the baseline, we show in this paper that this was not due to the neural approach to language identification in general, but to a flaw in the function we used to sample data for training and evaluation purposes. Preliminary experiments conducted after the evaluation period suggest that our neural approach to language identification can achieve state-of-the-art results on this task, although further experimentation is required.
Anthology ID:
2020.vardial-1.26
Volume:
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Yves Scherrer
Venue:
VarDial
SIG:
Publisher:
International Committee on Computational Linguistics (ICCL)
Note:
Pages:
273–282
Language:
URL:
https://aclanthology.org/2020.vardial-1.26
DOI:
Bibkey:
Cite (ACL):
Gabriel Bernier-Colborne and Cyril Goutte. 2020. Challenges in Neural Language Identification: NRC at VarDial 2020. In Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 273–282, Barcelona, Spain (Online). International Committee on Computational Linguistics (ICCL).
Cite (Informal):
Challenges in Neural Language Identification: NRC at VarDial 2020 (Bernier-Colborne & Goutte, VarDial 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.vardial-1.26.pdf