Pan Liu
2024
Enable Fast Sampling for Seq2Seq Text Diffusion
Pan Liu
|
Xiaohua Tian
|
Zhouhan Lin
Findings of the Association for Computational Linguistics: EMNLP 2024
Diffusion models exhibit promising capacity for generating high-quality text. However, owing to the curved nature of generation path, they necessitate traversing numerous steps to guarantee the text quality. In this paper, we propose an efficient model FMSeq, which utilizes flow matching to straighten the generation path, thereby enabling fast sampling for diffusion-based seq2seq text generation. Specifically, we construct transport flow only on the target sequences to adapt the diffusion-based model with flow matching. Furthermore, we explore different settings and identify target-parameterization, self-conditioning and time-difference as three effective techniques to improve the generation quality under a few steps. Experiments on four popular tasks demonstrate that FMSeq generates texts of comparable quality to the SOTA diffusion-based DiffuSeq in just 10 steps, achieving a 200-fold speedup.
2021
TenTrans Multilingual Low-Resource Translation System for WMT21 Indo-European Languages Task
Han Yang
|
Bojie Hu
|
Wanying Xie
|
Ambyera Han
|
Pan Liu
|
Jinan Xu
|
Qi Ju
Proceedings of the Sixth Conference on Machine Translation
This paper describes TenTrans’ submission to WMT21 Multilingual Low-Resource Translation shared task for the Romance language pairs. This task focuses on improving translation quality from Catalan to Occitan, Romanian and Italian, with the assistance of related high-resource languages. We mainly utilize back-translation, pivot-based methods, multilingual models, pre-trained model fine-tuning, and in-domain knowledge transfer to improve the translation quality. On the test set, our best-submitted system achieves an average of 43.45 case-sensitive BLEU scores across all low-resource pairs. Our data, code, and pre-trained models used in this work are available in TenTrans evaluation examples.