HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation

Brian Thompson; Rebecca Knowles; Xuan Zhang; Huda Khayrallah; Kevin Duh; Philipp Koehn

doi:10.18653/v1/D19-1142

HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation

Brian Thompson, Rebecca Knowles, Xuan Zhang, Huda Khayrallah, Kevin Duh, Philipp Koehn

Abstract

Bilingual lexicons are valuable resources used by professional human translators. While these resources can be easily incorporated in statistical machine translation, it is unclear how to best do so in the neural framework. In this work, we present the HABLex dataset, designed to test methods for bilingual lexicon integration into neural machine translation. Our data consists of human generated alignments of words and phrases in machine translation test sets in three language pairs (Russian-English, Chinese-English, and Korean-English), resulting in clean bilingual lexicons which are well matched to the reference. We also present two simple baselines - constrained decoding and continued training - and an improvement to continued training to address overfitting.

Anthology ID:: D19-1142
Original:: D19-1142v1
Version 2:: D19-1142v2
Volume:: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:: November
Year:: 2019
Address:: Hong Kong, China
Editors:: Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
Venues:: EMNLP | IJCNLP
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1382–1387
Language:
URL:: https://aclanthology.org/D19-1142/
DOI:: 10.18653/v1/D19-1142
Bibkey:
Cite (ACL):: Brian Thompson, Rebecca Knowles, Xuan Zhang, Huda Khayrallah, Kevin Duh, and Philipp Koehn. 2019. HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1382–1387, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):: HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation (Thompson et al., EMNLP-IJCNLP 2019)
Copy Citation:
PDF:: https://aclanthology.org/D19-1142.pdf

PDF (v2) PDF (v1) Cite Search Fix data