Evaluating HeLI with Non-Linear Mappings

Tommi Jauhiainen, Krister Lindén, Heidi Jauhiainen


Abstract
In this paper we describe the non-linear mappings we used with the Helsinki language identification method, HeLI, in the 4th edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial 2017 workshop. Our SUKI team participated on the closed track together with 10 other teams. Our system reached the 7th position in the track. We describe the HeLI method and the non-linear mappings in mathematical notation. The HeLI method uses a probabilistic model with character n-grams and word-based backoff. We also describe our trials using the non-linear mappings instead of relative frequencies and we present statistics about the back-off function of the HeLI method.
Anthology ID:
W17-1212
Volume:
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Preslav Nakov, Marcos Zampieri, Nikola Ljubešić, Jörg Tiedemann, Shevin Malmasi, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
102–108
Language:
URL:
https://aclanthology.org/W17-1212
DOI:
10.18653/v1/W17-1212
Bibkey:
Cite (ACL):
Tommi Jauhiainen, Krister Lindén, and Heidi Jauhiainen. 2017. Evaluating HeLI with Non-Linear Mappings. In Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), pages 102–108, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Evaluating HeLI with Non-Linear Mappings (Jauhiainen et al., VarDial 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-1212.pdf