Shuguang Liu


pdf bib
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
Shengming Yin | Chenfei Wu | Huan Yang | Jianfeng Wang | Xiaodong Wang | Minheng Ni | Zhengyuan Yang | Linjie Li | Shuguang Liu | Fan Yang | Jianlong Fu | Ming Gong | Lijuan Wang | Zicheng Liu | Houqiang Li | Nan Duan
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we propose NUWA-XL, a novel Diffusion over Diffusion architecture for eXtremely Long video generation. Most current work generates long videos segment by segment sequentially, which normally leads to the gap between training on short videos and inferring long videos, and the sequential generation is inefficient. Instead, our approach adopts a “coarse-to-fine” process, in which the video can be generated in parallel at the same granularity. A global diffusion model is applied to generate the keyframes across the entire time range, and then local diffusion models recursively fill in the content between nearby frames. This simple yet effective strategy allows us to directly train on long videos (3376 frames) to reduce the training-inference gap and makes it possible to generate all segments in parallel. To evaluate our model, we build FlintstonesHD dataset, a new benchmark for long video generation. Experiments show that our model not only generates high-quality long videos with both global and local coherence, but also decreases the average inference time from 7.55min to 26s (by 94.26%) at the same hardware setting when generating 1024 frames. The homepage link is [NUWA-XL](


pdf bib
XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation
Yaobo Liang | Nan Duan | Yeyun Gong | Ning Wu | Fenfei Guo | Weizhen Qi | Ming Gong | Linjun Shou | Daxin Jiang | Guihong Cao | Xiaodong Fan | Ruofei Zhang | Rahul Agrawal | Edward Cui | Sining Wei | Taroon Bharti | Ying Qiao | Jiun-Hung Chen | Winnie Wu | Shuguang Liu | Fan Yang | Daniel Campos | Rangan Majumder | Ming Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, we introduce XGLUE, a new benchmark dataset to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora, and evaluate their performance across a diverse set of cross-lingual tasks. Comparing to GLUE (Wang et al.,2019), which is labeled in English and includes natural language understanding tasks only, XGLUE has three main advantages: (1) it provides two corpora with different sizes for cross-lingual pre-training; (2) it provides 11 diversified tasks that cover both natural language understanding and generation scenarios; (3) for each task, it provides labeled data in multiple languages. We extend a recent cross-lingual pre-trained model Unicoder (Huang et al., 2019) to cover both understanding and generation tasks, which is evaluated on XGLUE as a strong baseline. We also evaluate the base versions (12-layer) of Multilingual BERT, XLM and XLM-R for comparison.