Transliterating From All Languages

Ann Irvine, Chris Callison-Burch, Alexandre Klementiev


Abstract
Much of the previous work on transliteration has depended on resources and attributes specific to particular language pairs. In this work, rather than focus on a single language pair, we create robust models for transliterating from all languages in a large, diverse set to English. We create training data for 150 languages by mining name pairs from Wikipedia. We train 13 systems and analyze the effects of the amount of training data on transliteration performance. We also present an analysis of the types of errors that the systems make. Our analyses are particularly valuable for building machine translation systems for low resource languages, where creating and integrating a transliteration module for a language with few NLP resources may provide substantial gains in translation performance.
Anthology ID:
2010.amta-papers.12
Volume:
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:
October 31-November 4
Year:
2010
Address:
Denver, Colorado, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
Language:
URL:
https://aclanthology.org/2010.amta-papers.12
DOI:
Bibkey:
Cite (ACL):
Ann Irvine, Chris Callison-Burch, and Alexandre Klementiev. 2010. Transliterating From All Languages. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers, Denver, Colorado, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Transliterating From All Languages (Irvine et al., AMTA 2010)
Copy Citation:
PDF:
https://aclanthology.org/2010.amta-papers.12.pdf