Lexicon models for hierarchical phrase-based machine translation

Matthias Huck, Saab Mansour, Simon Wiesler, Hermann Ney


Abstract
In this paper, we investigate lexicon models for hierarchical phrase-based statistical machine translation. We study five types of lexicon models: a model which is extracted from word-aligned training data and—given the word alignment matrix—relies on pure relative frequencies [1]; the IBM model 1 lexicon [2]; a regularized version of IBM model 1; a triplet lexicon model variant [3]; and a discriminatively trained word lexicon model [4]. We explore sourceto-target models with phrase-level as well as sentence-level scoring and target-to-source models with scoring on phrase level only. For the first two types of lexicon models, we compare several scoring variants. All models are used during search, i.e. they are incorporated directly into the log-linear model combination of the decoder. Phrase table smoothing with triplet lexicon models and with discriminative word lexicons are novel contributions. We also propose a new regularization technique for IBM model 1 by means of the Kullback-Leibler divergence with the empirical unigram distribution as regularization term. Experiments are carried out on the large-scale NIST Chinese→English translation task and on the English→French and Arabic→English IWSLT TED tasks. For Chinese→English and English→French, we obtain the best results by using the discriminative word lexicon to smooth our phrase tables.
Anthology ID:
2011.iwslt-papers.1
Volume:
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers
Month:
December 8-9
Year:
2011
Address:
San Francisco, California
Editors:
Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
191–198
Language:
URL:
https://aclanthology.org/2011.iwslt-papers.1
DOI:
Bibkey:
Cite (ACL):
Matthias Huck, Saab Mansour, Simon Wiesler, and Hermann Ney. 2011. Lexicon models for hierarchical phrase-based machine translation. In Proceedings of the 8th International Workshop on Spoken Language Translation: Papers, pages 191–198, San Francisco, California.
Cite (Informal):
Lexicon models for hierarchical phrase-based machine translation (Huck et al., IWSLT 2011)
Copy Citation:
PDF:
https://aclanthology.org/2011.iwslt-papers.1.pdf