Multilingual Continual Learning using Attention Distillation

Sanjay Agrawal; Deep Nayak; Vivek Varadarajan Sembium

Multilingual Continual Learning using Attention Distillation

Sanjay Agrawal, Deep Nayak, Vivek Varadarajan Sembium

Abstract

Query-product relevance classification is crucial for e-commerce stores like Amazon, ensuring accurate search results that match customer intent. Using a unified multilingual model across multiple languages/marketplaces tends to yield superior outcomes but also presents challenges, especially in maintaining performance across all languages when the model is updated or expanded to include a new one. To tackle this, we examine a multilingual continual learning (CL) framework focused on relevance classification tasks and address the issue of catastrophic forgetting. We propose a novel continual learning approach called attention distillation, which sequentially adds adapters for each new language and incorporates a fusion layer above language-specific adapters. This fusion layer distills attention scores from the previously trained fusion layer, focusing on the older adapters. Additionally, translating a portion of the new language data into older ones supports backward knowledge transfer. Our method reduces trainable parameters by 80%, enhancing computational efficiency and enabling frequent updates, while achieving a 1-3% ROC-AUC improvement over single marketplace baselines and outperforming SOTA CL methods on proprietary and external datasets.

Anthology ID:: 2025.coling-industry.8
Volume:: Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert, Kareem Darwish, Apoorv Agarwal
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 91–99
Language:
URL:: https://aclanthology.org/2025.coling-industry.8/
DOI:
Bibkey:
Cite (ACL):: Sanjay Agrawal, Deep Nayak, and Vivek Varadarajan Sembium. 2025. Multilingual Continual Learning using Attention Distillation. In Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 91–99, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Multilingual Continual Learning using Attention Distillation (Agrawal et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-industry.8.pdf

PDF Cite Search Fix data