Building a Morphological Analyser for Laz

Esra Onal, Francis Tyers


Abstract
This study is an attempt to contribute to documentation and revitalization efforts of endangered Laz language, a member of South Caucasian language family mainly spoken on northeastern coastline of Turkey. It constitutes the first steps to create a general computational model for word form recognition and production for Laz by building a rule-based morphological analyser using Helsinki Finite-State Toolkit (HFST). The evaluation results show that the analyser has a 64.9% coverage over a corpus collected for this study with 111,365 tokens. We have also performed an error analysis on randomly selected 100 tokens from the corpus which are not covered by the analyser, and these results show that the errors mostly result from Turkish words in the corpus and missing stems in our lexicon.
Anthology ID:
R19-1101
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
869–877
Language:
URL:
https://aclanthology.org/R19-1101
DOI:
10.26615/978-954-452-056-4_101
Bibkey:
Cite (ACL):
Esra Onal and Francis Tyers. 2019. Building a Morphological Analyser for Laz. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 869–877, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Building a Morphological Analyser for Laz (Onal & Tyers, RANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/R19-1101.pdf