Asynchronous Convergence in Multi-Task Learning via Knowledge Distillation from Converged Tasks

Weiyi Lu, Sunny Rajagopalan, Priyanka Nigam, Jaspreet Singh, Xiaodi Sun, Yi Xu, Belinda Zeng, Trishul Chilimbi


Abstract
Multi-task learning (MTL) aims to solve multiple tasks jointly by sharing a base representation among them. This can lead to more efficient learning and better generalization, as compared to learning each task individually. However, one issue that often arises in MTL is the convergence speed between tasks varies due to differences in task difficulty, so it can be a challenge to simultaneously achieve the best performance on all tasks with a single model checkpoint. Various techniques have been proposed to address discrepancies in task convergence rate, including weighting the per-task losses and modifying task gradients. In this work, we propose a novel approach that avoids the problem of requiring all tasks to converge at the same rate, but rather allows for “asynchronous” convergence among the tasks where each task can converge on its own schedule. As our main contribution, we monitor per-task validation metrics and switch to a knowledge distillation loss once a task has converged instead of continuing to train on the true labels. This prevents the model from overfitting on converged tasks while it learns the remaining tasks. We evaluate the proposed method in two 5-task MTL setups consisting of internal e-commerce datasets. The results show that our method consistently outperforms existing loss weighting and gradient balancing approaches, achieving average improvements of 0.9% and 1.5% over the best performing baseline model in the two setups, respectively.
Anthology ID:
2022.naacl-industry.18
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track
Month:
July
Year:
2022
Address:
Hybrid: Seattle, Washington + Online
Editors:
Anastassia Loukina, Rashmi Gangadharaiah, Bonan Min
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
149–159
Language:
URL:
https://aclanthology.org/2022.naacl-industry.18
DOI:
10.18653/v1/2022.naacl-industry.18
Bibkey:
Cite (ACL):
Weiyi Lu, Sunny Rajagopalan, Priyanka Nigam, Jaspreet Singh, Xiaodi Sun, Yi Xu, Belinda Zeng, and Trishul Chilimbi. 2022. Asynchronous Convergence in Multi-Task Learning via Knowledge Distillation from Converged Tasks. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, pages 149–159, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
Cite (Informal):
Asynchronous Convergence in Multi-Task Learning via Knowledge Distillation from Converged Tasks (Lu et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-industry.18.pdf
Video:
 https://aclanthology.org/2022.naacl-industry.18.mp4
Data
GLUE