Naive Bayes-based Experiments in Romanian Dialect Identification

Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén


Abstract
This article describes the experiments and systems developed by the SUKI team for the second edition of the Romanian Dialect Identification (RDI) shared task which was organized as part of the 2021 VarDial Evaluation Campaign. We submitted two runs to the shared task and our second submission was the overall best submission by a noticeable margin. Our best submission used a character n-gram based naive Bayes classifier with adaptive language models. We describe our experiments on the development set leading to both submissions.
Anthology ID:
2021.vardial-1.9
Volume:
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
April
Year:
2021
Address:
Kiyv, Ukraine
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Yves Scherrer, Tommi Jauhiainen
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
76–83
Language:
URL:
https://aclanthology.org/2021.vardial-1.9
DOI:
Bibkey:
Cite (ACL):
Tommi Jauhiainen, Heidi Jauhiainen, and Krister Lindén. 2021. Naive Bayes-based Experiments in Romanian Dialect Identification. In Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 76–83, Kiyv, Ukraine. Association for Computational Linguistics.
Cite (Informal):
Naive Bayes-based Experiments in Romanian Dialect Identification (Jauhiainen et al., VarDial 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.vardial-1.9.pdf
Data
MOROCO