CLLE: A Benchmark for Continual Language Learning Evaluation in Multilingual Machine Translation

Han Zhang; Sheng Zhang; Yang Xiang; Bin Liang (梁斌); Jinsong Su; Zhongjian Miao; Hui Wang; Ruifeng Xu (徐睿峰)

doi:10.18653/v1/2022.findings-emnlp.30

CLLE: A Benchmark for Continual Language Learning Evaluation in Multilingual Machine Translation

Han Zhang, Sheng Zhang, Yang Xiang, Bin Liang, Jinsong Su, Zhongjian Miao, Hui Wang, Ruifeng Xu

Abstract

Continual Language Learning (CLL) in multilingual translation is inevitable when new languages are required to be translated. Due to the lack of unified and generalized benchmarks, the evaluation of existing methods is greatly influenced by experimental design which usually has a big gap from the industrial demands. In this work, we propose the first Continual Language Learning Evaluation benchmark CLLE in multilingual translation. CLLE consists of a Chinese-centric corpus — CN-25 and two CLL tasks — the close-distance language continual learning task and the language family continual learning task designed for real and disparate demands. Different from existing translation benchmarks, CLLE considers several restrictions for CLL, including domain distribution alignment, content overlap, language diversity, and the balance of corpus. Furthermore, we propose a novel framework COMETA based on Constrained Optimization and META-learning to alleviate catastrophic forgetting and dependency on history training data by using a meta-model to retain the important parameters for old languages. Our experiments prove that CLLE is a challenging CLL benchmark and that our proposed method is effective when compared with other strong baselines. Due to the construction of the corpus, the task designing and the evaluation method are independent of the centric language, we also construct and release the English-centric corpus EN-25 to facilitate academic research.

Anthology ID:: 2022.findings-emnlp.30
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 428–443
Language:
URL:: https://aclanthology.org/2022.findings-emnlp.30/
DOI:: 10.18653/v1/2022.findings-emnlp.30
Bibkey:
Cite (ACL):: Han Zhang, Sheng Zhang, Yang Xiang, Bin Liang, Jinsong Su, Zhongjian Miao, Hui Wang, and Ruifeng Xu. 2022. CLLE: A Benchmark for Continual Language Learning Evaluation in Multilingual Machine Translation. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 428–443, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: CLLE: A Benchmark for Continual Language Learning Evaluation in Multilingual Machine Translation (Zhang et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-emnlp.30.pdf

PDF Cite Search Fix data