Mask-Align: Self-Supervised Neural Word Alignment

Chi Chen, Maosong Sun, Yang Liu


Abstract
Word alignment, which aims to align translationally equivalent words between source and target sentences, plays an important role in many natural language processing tasks. Current unsupervised neural alignment methods focus on inducing alignments from neural machine translation models, which does not leverage the full context in the target sequence. In this paper, we propose Mask-Align, a self-supervised word alignment model that takes advantage of the full context on the target side. Our model masks out each target token and predicts it conditioned on both source and the remaining target tokens. This two-step process is based on the assumption that the source token contributing most to recovering the masked target token should be aligned. We also introduce an attention variant called leaky attention, which alleviates the problem of unexpected high cross-attention weights on special tokens such as periods. Experiments on four language pairs show that our model outperforms previous unsupervised neural aligners and obtains new state-of-the-art results.
Anthology ID:
2021.acl-long.369
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Editors:
Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4781–4791
Language:
URL:
https://aclanthology.org/2021.acl-long.369
DOI:
10.18653/v1/2021.acl-long.369
Bibkey:
Cite (ACL):
Chi Chen, Maosong Sun, and Yang Liu. 2021. Mask-Align: Self-Supervised Neural Word Alignment. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4781–4791, Online. Association for Computational Linguistics.
Cite (Informal):
Mask-Align: Self-Supervised Neural Word Alignment (Chen et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-long.369.pdf
Video:
 https://aclanthology.org/2021.acl-long.369.mp4
Code
 THUNLP-MT/Mask-Align