Tag Assisted Neural Machine Translation of Film Subtitles

Aren Siekmeier, WonKee Lee, Hongseok Kwon, Jong-Hyeok Lee


Abstract
We implemented a neural machine translation system that uses automatic sequence tagging to improve the quality of translation. Instead of operating on unannotated sentence pairs, our system uses pre-trained tagging systems to add linguistic features to source and target sentences. Our proposed neural architecture learns a combined embedding of tokens and tags in the encoder, and simultaneous token and tag prediction in the decoder. Compared to a baseline with unannotated training, this architecture increased the BLEU score of German to English film subtitle translation outputs by 1.61 points using named entity tags; however, the BLEU score decreased by 0.38 points using part-of-speech tags. This demonstrates that certain token-level tag outputs from off-the-shelf tagging systems can improve the output of neural translation systems using our combined embedding and simultaneous decoding extensions.
Anthology ID:
2021.iwslt-1.30
Volume:
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
Month:
August
Year:
2021
Address:
Bangkok, Thailand (online)
Editors:
Marcello Federico, Alex Waibel, Marta R. Costa-jussà, Jan Niehues, Sebastian Stuker, Elizabeth Salesky
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Association for Computational Linguistics
Note:
Pages:
255–262
Language:
URL:
https://aclanthology.org/2021.iwslt-1.30
DOI:
10.18653/v1/2021.iwslt-1.30
Bibkey:
Cite (ACL):
Aren Siekmeier, WonKee Lee, Hongseok Kwon, and Jong-Hyeok Lee. 2021. Tag Assisted Neural Machine Translation of Film Subtitles. In Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021), pages 255–262, Bangkok, Thailand (online). Association for Computational Linguistics.
Cite (Informal):
Tag Assisted Neural Machine Translation of Film Subtitles (Siekmeier et al., IWSLT 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.iwslt-1.30.pdf
Code
 compwiztobe/tagged-seq2seq