Entropy– and Distance-Regularized Attention Improves Low-Resource Neural Machine Translation

Ali Araabi; Vlad Niculae; Christof Monz

Entropy– and Distance-Regularized Attention Improves Low-Resource Neural Machine Translation

Abstract

Transformer-based models in Neural Machine Translation (NMT) rely heavily on multi-head attention for capturing dependencies within and across source and target sequences. In Transformers, attention mechanisms dynamically determine which parts of the sentence to focus on in the encoder and decoder through self-attention and cross-attention. Our experiments show that high-resource NMT systems often exhibit a specific peaked attention distribution, indicating a focus on key elements. However, in low-resource NMT, attention tends to be dispersed throughout the sentence, lacking the focus demonstrated by high-resource models. To tackle this issue, we present EaDRA (Entropy– and Distance-Regularized Attention), which introduces an inductive bias to prioritize essential elements and guide the attention mechanism accordingly. Extensive experiments using EaDRA on diverse low-resource language pairs demonstrate significant improvements in translation quality, while incurring negligible computational cost.

Anthology ID:: 2024.amta-research.13
Volume:: Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)
Month:: September
Year:: 2024
Address:: Chicago, USA
Editors:: Rebecca Knowles, Akiko Eriguchi, Shivali Goel
Venue:: AMTA
SIG:
Publisher:: Association for Machine Translation in the Americas
Note:
Pages:: 140–153
Language:
URL:: https://aclanthology.org/2024.amta-research.13
DOI:
Bibkey:
Cite (ACL):: Ali Araabi, Vlad Niculae, and Christof Monz. 2024. Entropy– and Distance-Regularized Attention Improves Low-Resource Neural Machine Translation. In Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), pages 140–153, Chicago, USA. Association for Machine Translation in the Americas.
Cite (Informal):: Entropy– and Distance-Regularized Attention Improves Low-Resource Neural Machine Translation (Araabi et al., AMTA 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.amta-research.13.pdf

PDF Cite Search