Better Alignment with Instruction Back-and-Forth Translation

Thao Nguyen, Jeffrey Li, Sewoong Oh, Ludwig Schmidt, Jason Weston, Luke Zettlemoyer, Xian Li


Abstract
We propose a new method, instruction back-and-forth translation, to improve the quality of instruction-tuning data used for aligning large language models (LLMs). Given preprocessed texts from an initial web corpus (e.g. Dolma (Soldaini et al., 2024)), we generate synthetic instructions using the backtranslation approach proposed by Li et al., (2023), filter the generated data and rewrite the responses to improve their quality further based on the initial texts. Given similar quantities of instructions, fine-tuning Llama-2 on our (synthetic instruction, rewritten response) pairs yields better AlpacaEval win rates than using other common instruction datasets such as Humpback, ShareGPT, Open Orca, Alpaca-GPT4 and Self-instruct, at both 7B and 70B parameter scales. We also demonstrate that rewriting the responses with an LLM is different from direct distillation: the former process yields better win rate at 70B scale, and the two text distributions exhibit significant distinction in the embedding space. Besides, we provide analyses showing that our backtranslated instructions are of higher quality than other sources of synthetic instructions, while our responses are more diverse and complex than what can be obtained from distillation. Overall we find that instruction back-and-forth translation combines the best of both worlds—making use of the information diversity and quantity found on the web, while ensuring the quality of the responses which is necessary for effective alignment.
Anthology ID:
2024.findings-emnlp.777
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13289–13308
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.777
DOI:
Bibkey:
Cite (ACL):
Thao Nguyen, Jeffrey Li, Sewoong Oh, Ludwig Schmidt, Jason Weston, Luke Zettlemoyer, and Xian Li. 2024. Better Alignment with Instruction Back-and-Forth Translation. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 13289–13308, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Better Alignment with Instruction Back-and-Forth Translation (Nguyen et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.777.pdf