Best Practices of Successive Halving on Neural Machine Translation and Large Language Models

Xuan Zhang, Kevin Duh


Abstract
Hyperparameter optimization (HPO) enhances neural machine translation (NMT) models but demands substantial computational resources. Successive halving, a multi-fidelity HPO method, mitigates this by early stopping unpromising models and allocating more resources to promising ones. This method is particularly relevant for NMT and large language models, which are computationally intensive. However, successive halving relies on a noisy estimation of model performance and assumes that early performance is highly correlated with final performance. We introduce a table lookup benchmark dataset to study the reliability of successive halving and propose best practices for its application in NMT and large language models.
Anthology ID:
2024.amta-research.12
Volume:
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)
Month:
September
Year:
2024
Address:
Chicago, USA
Editors:
Rebecca Knowles, Akiko Eriguchi, Shivali Goel
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
130–139
Language:
URL:
https://aclanthology.org/2024.amta-research.12
DOI:
Bibkey:
Cite (ACL):
Xuan Zhang and Kevin Duh. 2024. Best Practices of Successive Halving on Neural Machine Translation and Large Language Models. In Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), pages 130–139, Chicago, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Best Practices of Successive Halving on Neural Machine Translation and Large Language Models (Zhang & Duh, AMTA 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.amta-research.12.pdf