BAM: A combination of deep and shallow models for German Dialect Identification.

Andrei M. Butnaru


Abstract
*This is a submission for the Third VarDial Evaluation Campaign* In this paper, we present a machine learning approach for the German Dialect Identification (GDI) Closed Shared Task of the DSL 2019 Challenge. The proposed approach combines deep and shallow models, by applying a voting scheme on the outputs resulted from a Character-level Convolutional Neural Networks (Char-CNN), a Long Short-Term Memory (LSTM) network, and a model based on String Kernels. The first model used is the Char-CNN model that merges multiple convolutions computed with kernels of different sizes. The second model is the LSTM network which applies a global max pooling over the returned sequences over time. Both models pass the activation maps to two fully-connected layers. The final model is based on String Kernels, computed on character p-grams extracted from speech transcripts. The model combines two blended kernel functions, one is the presence bits kernel, and the other is the intersection kernel. The empirical results obtained in the shared task prove that the approach can achieve good results. The system proposed in this paper obtained the fourth place with a macro-F1 score of 62.55%
Anthology ID:
W19-1413
Volume:
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
June
Year:
2019
Address:
Ann Arbor, Michigan
Editors:
Marcos Zampieri, Preslav Nakov, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
128–137
Language:
URL:
https://aclanthology.org/W19-1413
DOI:
10.18653/v1/W19-1413
Bibkey:
Cite (ACL):
Andrei M. Butnaru. 2019. BAM: A combination of deep and shallow models for German Dialect Identification.. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 128–137, Ann Arbor, Michigan. Association for Computational Linguistics.
Cite (Informal):
BAM: A combination of deep and shallow models for German Dialect Identification. (Butnaru, VarDial 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-1413.pdf