Improving Word Alignment by Exploiting Adapted Word Similarity

Septina Dian Larasati


Abstract
This paper presents a method to improve a word alignment model in a phrase-based Statistical Machine Translation system for a low-resourced language using a string similarity approach. Our method captures similar words that can be seen as semi-monolingual across languages, such as numbers, named entities, and adapted/loan words. We use several string similarity metrics to measure the monolinguality of the words, such as Longest Common Subsequence Ratio (LCSR), Minimum Edit Distance Ratio (MEDR), and we also use a modified BLEU Score (modBLEU). Our approach is to add intersecting alignment points for word pairs that are orthographically similar, before applying a word alignment heuristic, to generate a better word alignment. We demonstrate this approach on Indonesian-to-English translation task, where the languages share many similar words that are poorly aligned given a limited training data. This approach gives a statistically significant improvement by up to 0.66 in terms of BLEU score.
Anthology ID:
2012.amta-monomt.5
Volume:
Workshop on Monolingual Machine Translation
Month:
October 28-November 1
Year:
2012
Address:
San Diego, California, USA
Editors:
Tsuyoshi Okita, Artem Sokolov, Taro Watanabe
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
Language:
URL:
https://aclanthology.org/2012.amta-monomt.5
DOI:
Bibkey:
Cite (ACL):
Septina Dian Larasati. 2012. Improving Word Alignment by Exploiting Adapted Word Similarity. In Workshop on Monolingual Machine Translation, San Diego, California, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Improving Word Alignment by Exploiting Adapted Word Similarity (Larasati, AMTA 2012)
Copy Citation:
PDF:
https://aclanthology.org/2012.amta-monomt.5.pdf