Neural Networks for Cross-domain Language Identification. Phlyers @Vardial 2022

Andrea Ceolin


Abstract
We present our contribution to the Identification of Languages and Dialects of Italy shared task (ITDI) proposed in the VarDial Evaluation Campaign 2022, which asked participants to automatically identify the language of a text associated to one of the language varieties of Italy. The method that yielded the best results in our experiments was a Deep Feedforward Neural Network (DNN) trained on character ngram counts, which provided a better performance compared to Naive Bayes methods and Convolutional Neural Networks (CNN). The system was among the best methods proposed for the ITDI shared task. The analysis of the results suggests that simple DNNs could be more efficient than CNNs to perform language identification of close varieties.
Anthology ID:
2022.vardial-1.11
Volume:
Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Preslav Nakov, Jörg Tiedemann, Marcos Zampieri
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
99–108
Language:
URL:
https://aclanthology.org/2022.vardial-1.11
DOI:
Bibkey:
Cite (ACL):
Andrea Ceolin. 2022. Neural Networks for Cross-domain Language Identification. Phlyers @Vardial 2022. In Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 99–108, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
Neural Networks for Cross-domain Language Identification. Phlyers @Vardial 2022 (Ceolin, VarDial 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.vardial-1.11.pdf