Two-stage Pipeline for Multilingual Dialect Detection

Ankit Vaidya, Aditya Kane


Abstract
Dialect Identification is a crucial task for localizing various Large Language Models. This paper outlines our approach to the VarDial 2023 shared task. Here we have to identify three or two dialects from three languages each which results in a 9-way classification for Track-1 and 6-way classification for Track-2 respectively. Our proposed approach consists of a two-stage system and outperforms other participants’ systems and previous works in this domain. We achieve a score of 58.54% for Track-1 and 85.61% for Track-2. Our codebase is available publicly (https://github.com/ankit-vaidya19/EACL_VarDial2023).
Anthology ID:
2023.vardial-1.22
Volume:
Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Preslav Nakov, Jörg Tiedemann, Marcos Zampieri
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
222–229
Language:
URL:
https://aclanthology.org/2023.vardial-1.22
DOI:
10.18653/v1/2023.vardial-1.22
Bibkey:
Cite (ACL):
Ankit Vaidya and Aditya Kane. 2023. Two-stage Pipeline for Multilingual Dialect Detection. In Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023), pages 222–229, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Two-stage Pipeline for Multilingual Dialect Detection (Vaidya & Kane, VarDial 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.vardial-1.22.pdf
Video:
 https://aclanthology.org/2023.vardial-1.22.mp4