German Dialect Identification in Interview Transcriptions

Shervin Malmasi, Marcos Zampieri


Abstract
This paper presents three systems submitted to the German Dialect Identification (GDI) task at the VarDial Evaluation Campaign 2017. The task consists of training models to identify the dialect of Swiss-German speech transcripts. The dialects included in the GDI dataset are Basel, Bern, Lucerne, and Zurich. The three systems we submitted are based on: a plurality ensemble, a mean probability ensemble, and a meta-classifier trained on character and word n-grams. The best results were obtained by the meta-classifier achieving 68.1% accuracy and 66.2% F1-score, ranking first among the 10 teams which participated in the GDI shared task.
Anthology ID:
W17-1220
Volume:
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Preslav Nakov, Marcos Zampieri, Nikola Ljubešić, Jörg Tiedemann, Shevin Malmasi, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
164–169
Language:
URL:
https://aclanthology.org/W17-1220/
DOI:
10.18653/v1/W17-1220
Bibkey:
Cite (ACL):
Shervin Malmasi and Marcos Zampieri. 2017. German Dialect Identification in Interview Transcriptions. In Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), pages 164–169, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
German Dialect Identification in Interview Transcriptions (Malmasi & Zampieri, VarDial 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-1220.pdf