Hamidreza Ghader
2019
An Intrinsic Nearest Neighbor Analysis of Neural Machine Translation Architectures
Hamidreza Ghader
|
Christof Monz
Proceedings of Machine Translation Summit XVII: Research Track
2017
What does Attention in Neural Machine Translation Pay Attention to?
Hamidreza Ghader
|
Christof Monz
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Attention in neural machine translation provides the possibility to encode relevant parts of the source sentence at each translation step. As a result, attention is considered to be an alignment model as well. However, there is no work that specifically studies attention and provides analysis of what is being learned by attention models. Thus, the question still remains that how attention is similar or different from the traditional alignment. In this paper, we provide detailed analysis of attention and compare it to traditional alignment. We answer the question of whether attention is only capable of modelling translational equivalent or it captures more information. We show that attention is different from alignment in some cases and is capturing useful information other than alignments.
2016
Which Words Matter in Defining Phrase Reordering Behavior in Statistical Machine Translation?
Hamidreza Ghader
|
Christof Monz
Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track
Lexicalized and hierarchical reordering models use relative frequencies of fully lexicalized phrase pairs to learn phrase reordering distributions. This results in unreliable estimation for infrequent phrase pairs which also tend to be longer phrases. There are some smoothing techniques used to smooth the distributions in these models. But these techniques are unable to address the similarities between phrase pairs and their reordering distributions. We propose two models to use shorter sub-phrase pairs of an original phrase pair to smooth the phrase reordering distributions. In the first model we follow the classic idea of backing off to shorter histories commonly used in language model smoothing. In the second model, we use syntactic dependencies to identify the most relevant words in a phrase to back off to. We show how these models can be easily applied to existing lexicalized and hierarchical reordering models. Our models achieve improvements of up to 0.40 BLEU points in Chinese-English translation compared to a baseline which uses a regular lexicalized reordering model and a hierarchical reordering model. The results show that not all the words inside a phrase pair are equally important in defining phrase reordering behavior and shortening towards important words will decrease the sparsity problem for long phrase pairs.