AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs

Nicholas E. Corrado; Julian Katz-Samuels; Adithya M Devraj; Hyokun Yun; Chao Zhang; Yi Xu; Yi Pan; Bing Yin; Trishul Chilimbi

doi:10.18653/v1/2025.acl-long.990

AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs

Nicholas E. Corrado, Julian Katz-Samuels, Adithya M Devraj, Hyokun Yun, Chao Zhang, Yi Xu, Yi Pan, Bing Yin, Trishul Chilimbi

Abstract

When aligning large language models (LLMs), their performance across various tasks (such as being helpful, harmless, and honest) is heavily influenced by the composition of the training data. However, it is difficult to determine what mixture of data should be used to produce a model with strong performance across all tasks. Existing approaches rely on large ablation studies, heuristics, or human intuition, though these can be prohibitively expensive and suboptimal. We study this problem in the context of preference optimization via DPO and propose a novel and theoretically justified algorithm, AutoMixAlign (AMA), that adaptively mixes datasets during LLM training to balance performance across multiple tasks. AMA first trains specialist models for each task to determine losses that corresponding to strong task performance. Next, AMA trains a generalist model using a novel minimax optimization that prioritizes tasks for which generalist model losses are furthest from specialist model losses. We introduce two algorithms to optimize this problem: (1) AMA-R adaptively reweights the objective to prioritize tasks, and (2) AMA-S adaptively adjusts how much data is sampled from each task to prioritize tasks. Both algorithms achieve a convergence rate of O(1/√T) in the convex case. AMA-R’s convergence result immediately follows from Sagawa et. al, 2019, and we provide a convergence proof for AMA-S using techniques from online learning such as EXP3 (Auer et. al, 2002). We evaluate AMA on several multitask alignment setups, and observe that AMA outperforms the standard alignment approach which simply optimizes the total loss across all tasks and also outperforms model-merging methods.

Anthology ID:: 2025.acl-long.990
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 20234–20258
Language:
URL:: https://aclanthology.org/2025.acl-long.990/
DOI:: 10.18653/v1/2025.acl-long.990
Bibkey:
Cite (ACL):: Nicholas E. Corrado, Julian Katz-Samuels, Adithya M Devraj, Hyokun Yun, Chao Zhang, Yi Xu, Yi Pan, Bing Yin, and Trishul Chilimbi. 2025. AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 20234–20258, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs (Corrado et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.990.pdf

PDF Cite Search Fix data