Building MT for a Severely Under-Resourced Language: White Hmong

William Lewis; Phong Yang

Building MT for a Severely Under-Resourced Language: White Hmong

Abstract

In this paper, we discuss the development of statistical machine translation for English to/from White Hmong (Language code: mww). White Hmong is a Hmong-Mien language, originally spoken mostly in Southeast Asia, but now predominantly spoken by a large diaspora throughout the world, with populations in the United States, Australia, France, Thailand and elsewhere. Building statistical translation systems for Hmong proved to be incredibly challenging since there are no known parallel or monolingual corpora for the language; in fact, finding data for Hmong proved to be one of the biggest challenges to getting the project off the ground. It was only through a close collaboration with the Hmong community, and active and tireless participation of Hmong speakers, that it became possible to build up a critical mass of data to make the translation project a reality. We see this effort as potentially replicable for other severely resource poor languages of the world, which is likely the case for the majority of the languages still spoken on the planet. Further, the work here suggests that research and work on other severely under-resourced languages can have significant positive impacts for the affected communities, both for accessibility and language preservation.

Anthology ID:: 2012.amta-papers.10
Volume:: Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:: October 28-November 1
Year:: 2012
Address:: San Diego, California, USA
Venue:: AMTA
SIG:
Publisher:: Association for Machine Translation in the Americas
Note:
Pages:
Language:
URL:: https://aclanthology.org/2012.amta-papers.10/
DOI:
Bibkey:
Cite (ACL):: William Lewis and Phong Yang. 2012. Building MT for a Severely Under-Resourced Language: White Hmong. In Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers, San Diego, California, USA. Association for Machine Translation in the Americas.
Cite (Informal):: Building MT for a Severely Under-Resourced Language: White Hmong (Lewis & Yang, AMTA 2012)
Copy Citation:
PDF:: https://aclanthology.org/2012.amta-papers.10.pdf

PDF Cite Search Fix data