@inproceedings{thompson-etal-2019-hablex,
    title = "{HABL}ex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation",
    author = "Thompson, Brian  and
      Knowles, Rebecca  and
      Zhang, Xuan  and
      Khayrallah, Huda  and
      Duh, Kevin  and
      Koehn, Philipp",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-1142",
    doi = "10.18653/v1/D19-1142",
    pages = "1382--1387",
    abstract = "Bilingual lexicons are valuable resources used by professional human translators. While these resources can be easily incorporated in statistical machine translation, it is unclear how to best do so in the neural framework. In this work, we present the HABLex dataset, designed to test methods for bilingual lexicon integration into neural machine translation. Our data consists of human generated alignments of words and phrases in machine translation test sets in three language pairs (Russian-English, Chinese-English, and Korean-English), resulting in clean bilingual lexicons which are well matched to the reference. We also present two simple baselines - constrained decoding and continued training - and an improvement to continued training to address overfitting.",
}
