Pan Liu

2024

pdf bib abs
Enable Fast Sampling for Seq2Seq Text Diffusion
Pan Liu | Xiaohua Tian | Zhouhan Lin
Findings of the Association for Computational Linguistics: EMNLP 2024

Diffusion models exhibit promising capacity for generating high-quality text. However, owing to the curved nature of generation path, they necessitate traversing numerous steps to guarantee the text quality. In this paper, we propose an efficient model FMSeq, which utilizes flow matching to straighten the generation path, thereby enabling fast sampling for diffusion-based seq2seq text generation. Specifically, we construct transport flow only on the target sequences to adapt the diffusion-based model with flow matching. Furthermore, we explore different settings and identify target-parameterization, self-conditioning and time-difference as three effective techniques to improve the generation quality under a few steps. Experiments on four popular tasks demonstrate that FMSeq generates texts of comparable quality to the SOTA diffusion-based DiffuSeq in just 10 steps, achieving a 200-fold speedup.

2021

This paper describes TenTrans’ submission to WMT21 Multilingual Low-Resource Translation shared task for the Romance language pairs. This task focuses on improving translation quality from Catalan to Occitan, Romanian and Italian, with the assistance of related high-resource languages. We mainly utilize back-translation, pivot-based methods, multilingual models, pre-trained model fine-tuning, and in-domain knowledge transfer to improve the translation quality. On the test set, our best-submitted system achieves an average of 43.45 case-sensitive BLEU scores across all low-resource pairs. Our data, code, and pre-trained models used in this work are available in TenTrans evaluation examples.

Co-authors

Wanying Xie 1

Jinan Xu (徐金安) 1

Han Yang 1

Venues

findings1
wmt1

Fix data