Experiments in Language Variety Geolocation and Dialect Identification

Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén


Abstract
In this paper we describe the systems we used when participating in the VarDial Evaluation Campaign organized as part of the 7th workshop on NLP for similar languages, varieties and dialects. The shared tasks we participated in were the second edition of the Romanian Dialect Identification (RDI) and the first edition of the Social Media Variety Geolocation (SMG). The submissions of our SUKI team used generative language models based on Naive Bayes and character n-grams.
Anthology ID:
2020.vardial-1.21
Volume:
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Yves Scherrer
Venue:
VarDial
SIG:
Publisher:
International Committee on Computational Linguistics (ICCL)
Note:
Pages:
220–231
Language:
URL:
https://aclanthology.org/2020.vardial-1.21
DOI:
Bibkey:
Cite (ACL):
Tommi Jauhiainen, Heidi Jauhiainen, and Krister Lindén. 2020. Experiments in Language Variety Geolocation and Dialect Identification. In Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 220–231, Barcelona, Spain (Online). International Committee on Computational Linguistics (ICCL).
Cite (Informal):
Experiments in Language Variety Geolocation and Dialect Identification (Jauhiainen et al., VarDial 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.vardial-1.21.pdf
Data
MOROCO