Italian Language and Dialect Identification and Regional French Variety Detection using Adaptive Naive Bayes

Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén


Abstract
This article describes the language identification approach used by the SUKI team in the Identification of Languages and Dialects of Italy and the French Cross-Domain Dialect Identification shared tasks organized as part of the VarDial workshop 2022. We describe some experiments and the preprocessing techniques we used for the training data in preparation for the shared task submissions, which are also discussed. Our Naive Bayes-based adaptive system reached the first position in Italian language identification and came second in the French variety identification task.
Anthology ID:
2022.vardial-1.13
Volume:
Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Preslav Nakov, Jörg Tiedemann, Marcos Zampieri
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
119–129
Language:
URL:
https://aclanthology.org/2022.vardial-1.13
DOI:
Bibkey:
Cite (ACL):
Tommi Jauhiainen, Heidi Jauhiainen, and Krister Lindén. 2022. Italian Language and Dialect Identification and Regional French Variety Detection using Adaptive Naive Bayes. In Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 119–129, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
Italian Language and Dialect Identification and Regional French Variety Detection using Adaptive Naive Bayes (Jauhiainen et al., VarDial 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.vardial-1.13.pdf
Code
 tosaja/tunprf