Minimally Supervised Morphological Segmentation with Applications to Machine Translation

Jason Riesa, David Yarowsky


Abstract
Inflected languages in a low-resource setting present a data sparsity problem for statistical machine translation. In this paper, we present a minimally supervised algorithm for morpheme segmentation on Arabic dialects which reduces unknown words at translation time by over 50%, total vocabulary size by over 40%, and yields a significant increase in BLEU score over a previous state-of-the-art phrase-based statistical MT system.
Anthology ID:
2006.amta-papers.21
Volume:
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:
August 8-12
Year:
2006
Address:
Cambridge, Massachusetts, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
185–192
Language:
URL:
https://aclanthology.org/2006.amta-papers.21
DOI:
Bibkey:
Cite (ACL):
Jason Riesa and David Yarowsky. 2006. Minimally Supervised Morphological Segmentation with Applications to Machine Translation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 185–192, Cambridge, Massachusetts, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Minimally Supervised Morphological Segmentation with Applications to Machine Translation (Riesa & Yarowsky, AMTA 2006)
Copy Citation:
PDF:
https://aclanthology.org/2006.amta-papers.21.pdf