Attention Transformer Model for Translation of Similar Languages

Farhan Dhanani, Muhammad Rafi


Abstract
This paper illustrates our approach to the shared task on similar language translation in the fifth conference on machine translation (WMT-20). Our motivation comes from the latest state of the art neural machine translation in which Transformers and Recurrent Attention models are effectively used. A typical sequence-sequence architecture consists of an encoder and a decoder Recurrent Neural Network (RNN). The encoder recursively processes a source sequence and reduces it into a fixed-length vector (context), and the decoder generates a target sequence, token by token, conditioned on the same context. In contrast, the advantage of transformers is to reduce the training time by offering a higher degree of parallelism at the cost of freedom for sequential order. With the introduction of Recurrent Attention, it allows the decoder to focus effectively on order of the source sequence at different decoding steps. In our approach, we have combined the recurrence based layered encoder-decoder model with the Transformer model. Our Attention Transformer model enjoys the benefits of both Recurrent Attention and Transformer to quickly learn the most probable sequence for decoding in the target language. The architecture is especially suited for similar languages (languages coming from the same family). We have submitted our system for both Indo-Aryan Language forward (Hindi to Marathi) and reverse (Marathi to Hindi) pair. Our system trains on the parallel corpus of the training dataset provided by the organizers and achieved an average BLEU point of 3.68 with 97.64 TER score for the Hindi-Marathi, along with 9.02 BLEU point and 88.6 TER score for Marathi-Hindi testing set.
Anthology ID:
2020.wmt-1.43
Volume:
Proceedings of the Fifth Conference on Machine Translation
Month:
November
Year:
2020
Address:
Online
Editors:
Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Yvette Graham, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
387–392
Language:
URL:
https://aclanthology.org/2020.wmt-1.43
DOI:
Bibkey:
Cite (ACL):
Farhan Dhanani and Muhammad Rafi. 2020. Attention Transformer Model for Translation of Similar Languages. In Proceedings of the Fifth Conference on Machine Translation, pages 387–392, Online. Association for Computational Linguistics.
Cite (Informal):
Attention Transformer Model for Translation of Similar Languages (Dhanani & Rafi, WMT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wmt-1.43.pdf
Video:
 https://slideslive.com/38939592
Code
 farhandhanani/wmt-20-submission-shared-task-similar-language-translation
Data
WMT 2020