The Use of Khislavichi Lect Morphological Tagging to Determine its Position in the East Slavic Group

Ilia Afanasev


Abstract
The study of low-resourced East Slavic lects is becoming increasingly relevant as they face the prospect of extinction under the pressure of standard Russian while being treated by academia as an inferior part of this lect. The Khislavichi lect, spoken in a settlement on the border of Russia and Belarus, is a perfect example of such an attitude. We take an alternative approach and study East Slavic lects (such as Khislavichi) as separate systems. The proposed method includes the development of a tagged corpus through morphological tagging with the models trained on the bigger lects. Morphological tagging results may be used to place these lects among the bigger ones, such as standard Belarusian or standard Russian. The implemented morphological taggers of standard Russian and standard Belarusian demonstrate an accuracy higher than the accuracy of multilingual models by 3 to 15%. The study suggests possible ways to adapt these taggers to the Khislavichi dataset, such as tagset unification and transcription closer to the actual sound rather than the standard lect pronunciation. Automatic classification supports the hypothesis that Khislavichi is a border East Slavic lect that historically was Belarusian but got russified: the algorithm places it either slightly closer to Russian or to Belarusian.
Anthology ID:
2023.vardial-1.18
Volume:
Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Preslav Nakov, Jörg Tiedemann, Marcos Zampieri
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
174–186
Language:
URL:
https://aclanthology.org/2023.vardial-1.18
DOI:
10.18653/v1/2023.vardial-1.18
Bibkey:
Cite (ACL):
Ilia Afanasev. 2023. The Use of Khislavichi Lect Morphological Tagging to Determine its Position in the East Slavic Group. In Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023), pages 174–186, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
The Use of Khislavichi Lect Morphological Tagging to Determine its Position in the East Slavic Group (Afanasev, VarDial 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.vardial-1.18.pdf
Video:
 https://aclanthology.org/2023.vardial-1.18.mp4