Multilingual Continual Learning using Attention Distillation

Sanjay Agrawal, Deep Nayak, Vivek Varadarajan Sembium


Abstract
Query-product relevance classification is crucial for e-commerce stores like Amazon, ensuring accurate search results that match customer intent. Using a unified multilingual model across multiple languages/marketplaces tends to yield superior outcomes but also presents challenges, especially in maintaining performance across all languages when the model is updated or expanded to include a new one. To tackle this, we examine a multilingual continual learning (CL) framework focused on relevance classification tasks and address the issue of catastrophic forgetting. We propose a novel continual learning approach called attention distillation, which sequentially adds adapters for each new language and incorporates a fusion layer above language-specific adapters. This fusion layer distills attention scores from the previously trained fusion layer, focusing on the older adapters. Additionally, translating a portion of the new language data into older ones supports backward knowledge transfer. Our method reduces trainable parameters by 80%, enhancing computational efficiency and enabling frequent updates, while achieving a 1-3% ROC-AUC improvement over single marketplace baselines and outperforming SOTA CL methods on proprietary and external datasets.
Anthology ID:
2025.coling-industry.8
Volume:
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert, Kareem Darwish, Apoorv Agarwal
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
91–99
Language:
URL:
https://aclanthology.org/2025.coling-industry.8/
DOI:
Bibkey:
Cite (ACL):
Sanjay Agrawal, Deep Nayak, and Vivek Varadarajan Sembium. 2025. Multilingual Continual Learning using Attention Distillation. In Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 91–99, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Multilingual Continual Learning using Attention Distillation (Agrawal et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-industry.8.pdf