Quantitative Lect Description: A Case Study of Lemko from the Field Data of 1920s-1930s

Ilia Afanasev

Quantitative Lect Description: A Case Study of Lemko from the Field Data of 1920s-1930s

Abstract

While qualitative descriptions (in the form of reference grammars) and benchmarks for low-resource languages are becoming increasingly widespread, computational linguists do not often use quantitative methods to describe a new lect rather than a new model. This paper intends to close this lacuna. The case study is a Lemko text transcribed at the beginning of the twentieth century. Using morphosyntactic tagging and topic modelling, the study demonstrates areal influences and archaic features of the lect. Fine-grained evaluation significantly assists in identifying subtle patterns that are not readily apparent through traditional metrics such as accuracy score. The results highlight the necessity of a more detailed analysis of model performance, which may yield more linguistically significant results than a purely manual check. This information is present in the resulting dataset, which can be used for further investigation into the structural features of the Lemko lect.

Anthology ID:: 2026.fieldmatters-1.6
Volume:: Proceedings of the Fifth Workshop on NLP Applications to Field Linguistics
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Venues:: FieldMatters | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 46–59
Language:
URL:: https://aclanthology.org/2026.fieldmatters-1.6/
DOI:
Bibkey:
Cite (ACL):: Ilia Afanasev. 2026. Quantitative Lect Description: A Case Study of Lemko from the Field Data of 1920s-1930s. In Proceedings of the Fifth Workshop on NLP Applications to Field Linguistics, pages 46–59, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Quantitative Lect Description: A Case Study of Lemko from the Field Data of 1920s-1930s (Afanasev, FieldMatters 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.fieldmatters-1.6.pdf

PDF Cite Search Fix data