Residual Dropout: A Simple Approach to Improve Transformer’s Data Efficiency

Carlos Escolano, Francesca De Luca Fornaciari, Maite Melero


Abstract
Transformer models often demand a vast amount of training data to achieve the desired level of performance. However, this data requirement poses a major challenge for low-resource languages seeking access to high-quality systems, particularly in tasks like Machine Translation. To address this issue, we propose adding Dropout to Transformer’s Residual Connections. Our experimental results demonstrate that this modification effectively mitigates overfitting during training, resulting in substantial performance gains of over 4 BLEU points on a dataset consisting of merely 10 thousand examples.
Anthology ID:
2024.sigul-1.35
Volume:
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Maite Melero, Sakriani Sakti, Claudia Soria
Venues:
SIGUL | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
294–299
Language:
URL:
https://aclanthology.org/2024.sigul-1.35
DOI:
Bibkey:
Cite (ACL):
Carlos Escolano, Francesca De Luca Fornaciari, and Maite Melero. 2024. Residual Dropout: A Simple Approach to Improve Transformer’s Data Efficiency. In Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024, pages 294–299, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Residual Dropout: A Simple Approach to Improve Transformer’s Data Efficiency (Escolano et al., SIGUL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.sigul-1.35.pdf