Introducing YakuToolkit. Yakut Treebank and Morphological Analyzer.

Tatiana Merzhevich, Fabrício Ferraz Gerardi


Abstract
This poster presents the first publicly available treebank of Yakut, a Turkic language spoken in Russia, and a morphological analyzer for this language. The treebank was annotated following the Universal Dependencies (UD) framework and the mor- phological analyzer can directly access and use its data. Yakut is an under-represented language whose prominence can be raised by making reliably annotated data and NLP tools that could process it freely accessible. The publication of both the treebank and the analyzer serves this purpose with the prospect of evolving into a benchmark for the development of NLP online tools for other languages of the Turkic family in the future.
Anthology ID:
2022.sigul-1.24
Volume:
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Maite Melero, Sakriani Sakti, Claudia Soria
Venue:
SIGUL
SIG:
SIGUL
Publisher:
European Language Resources Association
Note:
Pages:
185–188
Language:
URL:
https://aclanthology.org/2022.sigul-1.24
DOI:
Bibkey:
Cite (ACL):
Tatiana Merzhevich and Fabrício Ferraz Gerardi. 2022. Introducing YakuToolkit. Yakut Treebank and Morphological Analyzer.. In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, pages 185–188, Marseille, France. European Language Resources Association.
Cite (Informal):
Introducing YakuToolkit. Yakut Treebank and Morphological Analyzer. (Merzhevich & Ferraz Gerardi, SIGUL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.sigul-1.24.pdf