Incorporating Syntactic Uncertainty in Neural Machine Translation with a Forest-to-Sequence Model

Poorya Zaremoodi, Gholamreza Haffari


Abstract
Incorporating syntactic information in Neural Machine Translation (NMT) can lead to better reorderings, particularly useful when the language pairs are syntactically highly divergent or when the training bitext is not large. Previous work on using syntactic information, provided by top-1 parse trees generated by (inevitably error-prone) parsers, has been promising. In this paper, we propose a forest-to-sequence NMT model to make use of exponentially many parse trees of the source sentence to compensate for the parser errors. Our method represents the collection of parse trees as a packed forest, and learns a neural transducer to translate from the input forest to the target sentence. Experiments on English to German, Chinese and Farsi translation tasks show the superiority of our approach over the sequence-to-sequence and tree-to-sequence neural translation models.
Anthology ID:
C18-1120
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1421–1429
Language:
URL:
https://aclanthology.org/C18-1120
DOI:
Bibkey:
Cite (ACL):
Poorya Zaremoodi and Gholamreza Haffari. 2018. Incorporating Syntactic Uncertainty in Neural Machine Translation with a Forest-to-Sequence Model. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1421–1429, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Incorporating Syntactic Uncertainty in Neural Machine Translation with a Forest-to-Sequence Model (Zaremoodi & Haffari, COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1120.pdf