Using the Nunavut Hansard Data for Experiments in Morphological Analysis and Machine Translation

Jeffrey Micher


Abstract
Inuktitut is a polysynthetic language spoken in Northern Canada and is one of the official languages of the Canadian territory of Nunavut. As such, the Nunavut Legislature publishes all of its proceedings in parallel English and Inuktitut. Several parallel English-Inuktitut corpora from these proceedings have been created from these data and are publically available. The corpus used for current experiments is described. Morphological processing of one of these corpora was carried out and details about the processing are provided. Then, the processed corpus was used in morphological analysis and machine translation (MT) experiments. The morphological analysis experiments aimed to improve the coverage of morphological processing of the corpus, and compare an additional experimental condition to previously published results. The machine translation experiments made use of the additional morphologically analyzed word types in a statistical machine translation system designed to translate to and from Inuktitut morphemes. Results are reported and next steps are defined.
Anthology ID:
W18-4807
Volume:
Proceedings of the Workshop on Computational Modeling of Polysynthetic Languages
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venues:
COLING | PYLO | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
65–72
Language:
URL:
https://aclanthology.org/W18-4807
DOI:
Bibkey:
Cite (ACL):
Jeffrey Micher. 2018. Using the Nunavut Hansard Data for Experiments in Morphological Analysis and Machine Translation. In Proceedings of the Workshop on Computational Modeling of Polysynthetic Languages, pages 65–72, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Using the Nunavut Hansard Data for Experiments in Morphological Analysis and Machine Translation (Micher, 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-4807.pdf