Wenhao Zhu


pdf bib
kNN-BOX: A Unified Framework for Nearest Neighbor Generation
Wenhao Zhu | Qianfeng Zhao | Yunzhe Lv | Shujian Huang | Siheng Zhao | Sizhe Liu | Jiajun Chen
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

Augmenting the base neural model with a token-level symbolic datastore is a novel generation paradigm and has achieved promising results in machine translation (MT). In this paper, we introduce a unified framework kNN-BOX, which enables quick development and visualization for this novel paradigm. kNN-BOX decomposes the datastore-augmentation approach into three modules: datastore, retriever and combiner, thus putting diverse kNN generation methods into a unified way. Currently, kNN-BOX has provided implementation of seven popular kNN-MT variants, covering research from performance enhancement to efficiency optimization. It is easy for users to reproduce these existing work or customize their own models. Besides, users can interact with their kNN generation systems with kNN-BOX to better understand the underlying inference process in a visualized way. In experiment section, we apply kNN-BOX for machine translation and three other seq2seq generation tasks (text simplification, paraphrase generation and question generation). Experiment results show that augmenting the base neural model with kNN-BOX can bring large performance improvement in all these tasks. The code and document of kNN-BOX is available at https://github.com/NJUNLP/knn-box. The demo can be accessed at http://nlp.nju.edu.cn/demo/knn-box/. The introduction video is available at https://www.youtube.com/watch?v=m0eJldHVR3w.


pdf bib
机器翻译和大语言模型研究进展(Research Development of Machine translation and Large Language Model)
Wenhao Zhu (文昊 朱) | Hao Zhou (昊 周) | Changjiang Gao (长江 高) | Sizhe Liu (斯哲 刘) | Shujian Huang (书剑 黄)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum)


pdf bib
What Knowledge Is Needed? Towards Explainable Memory for kNN-MT Domain Adaptation
Wenhao Zhu | Shujian Huang | Yunzhe Lv | Xin Zheng | Jiajun Chen
Findings of the Association for Computational Linguistics: ACL 2023

kNN-MT presents a new paradigm for domain adaptation by building an external datastore, which usually saves all target language token occurrences in the parallel corpus. As a result, the constructed datastore is usually large and possibly redundant. In this paper, we investigate the interpretability issue of this approach: what knowledge does the NMT model need? We propose the notion of local correctness (LAC) as a new angle, which describes the potential translation correctness for a single entry and for a given neighborhood. Empirical study shows that our investigation successfully finds the conditions where the NMT model could easily fail and need related knowledge. Experiments on six diverse target domains and two language-pairs show that pruning according to local correctness brings a light and more explainable memory for kNN-MT domain adaptation.

pdf bib
Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation
Fei Yuan | Yinquan Lu | Wenhao Zhu | Lingpeng Kong | Lei Li | Yu Qiao | Jingjing Xu
Findings of the Association for Computational Linguistics: ACL 2023

Multilingual neural machine translation (MNMT) aims to build a unified model for many language directions. Existing monolithic models for MNMT encounter two challenges: parameter interference among languages and inefficient inference for large models. In this paper, we revisit the classic multi-way structures and develop a detachable model by assigning each language (or group of languages) to an individual branch that supports plug-and-play training and inference. To address the needs of learning representations for all languages in a unified space, we propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT.For a fair comparison, we collect data from OPUS and build a translation benchmark covering 433 languages and 1.3B parallel data. Experiments show that Lego-MT with 1.2B parameters brings an average gain of 3.2 spBLEU. It even outperforms M2M-100 with 12B parameters. The proposed training recipe brings a 28.2× speedup over the conventional multi-way training method.code and data repo: https://github.com/CONE-MT/Lego-MT.git.

pdf bib
INK: Injecting kNN Knowledge in Nearest Neighbor Machine Translation
Wenhao Zhu | Jingjing Xu | Shujian Huang | Lingpeng Kong | Jiajun Chen
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Neural machine translation has achieved promising results on many translation tasks. However, previous studies have shown that neural models induce a non-smooth representation space, which harms its generalization results. Recently, kNN-MT has provided an effective paradigm to smooth the prediction based on neighbor representations during inference. Despite promising results, kNN-MT usually requires large inference overhead. We propose an effective training framework INK to directly smooth the representation space via adjusting representations of kNN neighbors with a small number of new parameters. The new parameters are then used to refresh the whole representation datastore to get new kNN knowledge asynchronously. This loop keeps running until convergence. Experiments on four benchmark datasets show that INK achieves average gains of 1.99 COMET and 1.0 BLEU, outperforming the state-of-the-art kNN-MT system with 0.02x memory space and 1.9x inference speedup.


pdf bib
FGraDA: A Dataset and Benchmark for Fine-Grained Domain Adaptation in Machine Translation
Wenhao Zhu | Shujian Huang | Tong Pu | Pingxuan Huang | Xu Zhang | Jian Yu | Wei Chen | Yanfeng Wang | Jiajun Chen
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Previous research for adapting a general neural machine translation (NMT) model into a specific domain usually neglects the diversity in translation within the same domain, which is a core problem for domain adaptation in real-world scenarios. One representative of such challenging scenarios is to deploy a translation system for a conference with a specific topic, e.g., global warming or coronavirus, where there are usually extremely less resources due to the limited schedule. To motivate wider investigation in such a scenario, we present a real-world fine-grained domain adaptation task in machine translation (FGraDA). The FGraDA dataset consists of Chinese-English translation task for four sub-domains of information technology: autonomous vehicles, AI education, real-time networks, and smart phone. Each sub-domain is equipped with a development set and test set for evaluation purposes. To be closer to reality, FGraDA does not employ any in-domain bilingual training data but provides bilingual dictionaries and wiki knowledge base, which can be easier obtained within a short time. We benchmark the fine-grained domain adaptation task and present in-depth analyses showing that there are still challenging problems to further improve the performance with heterogeneous resources.