A finite-state morphological transducer for Kyrgyz

Jonathan Washington, Mirlan Ipasov, Francis Tyers


Abstract
This paper describes the development of a free/open-source finite-state morphological transducer for Kyrgyz. The transducer has been developed for morphological generation for use within a prototype Turkish→Kyrgyz machine translation system, but has also been extensively tested for analysis. The finite-state toolkit used for the work was the Helsinki Finite-State Toolkit (HFST). The paper describes some issues in Kyrgyz morphology, the development of the tool, some linguistic issues encountered and how they were dealt with, and which issues are left to resolve. An evaluation is presented which shows that the transducer has medium-level coverage, between 82% and 87% on two freely available corpora of Kyrgyz, and high precision and recall over a manually verified test set.
Anthology ID:
L12-1642
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
934–940
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1077_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Jonathan Washington, Mirlan Ipasov, and Francis Tyers. 2012. A finite-state morphological transducer for Kyrgyz. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 934–940, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
A finite-state morphological transducer for Kyrgyz (Washington et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1077_Paper.pdf