%0 Conference Proceedings %T Vecalign: Improved Sentence Alignment in Linear Time and Space %A Thompson, Brian %A Koehn, Philipp %Y Inui, Kentaro %Y Jiang, Jing %Y Ng, Vincent %Y Wan, Xiaojun %S Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) %D 2019 %8 November %I Association for Computational Linguistics %C Hong Kong, China %F thompson-koehn-2019-vecalign %X We introduce Vecalign, a novel bilingual sentence alignment method which is linear in time and space with respect to the number of sentences being aligned and which requires only bilingual sentence embeddings. On a standard German–French test set, Vecalign outperforms the previous state-of-the-art method (which has quadratic time complexity and requires a machine translation system) by 5 F1 points. It substantially outperforms the popular Hunalign toolkit at recovering Bible verse alignments in medium- to low-resource language pairs, and it improves downstream MT quality by 1.7 and 1.6 BLEU in Sinhala-English and Nepali-English, respectively, compared to the Hunalign-based Paracrawl pipeline. %R 10.18653/v1/D19-1136 %U https://aclanthology.org/D19-1136 %U https://doi.org/10.18653/v1/D19-1136 %P 1342-1348