TextBox 2.0: A Text Generation Library with Pre-trained Language Models

Tianyi Tang, Junyi Li, Zhipeng Chen, Yiwen Hu, Zhuohao Yu, Wenxun Dai, Wayne Xin Zhao, Jian-yun Nie, Ji-rong Wen


Abstract
To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers 13 common text generation tasks and their corresponding 83 datasets and further incorporates 45 PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement 4 efficient training strategies and provide 4 generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link: https://github.com/RUCAIBox/TextBox#2.0.
Anthology ID:
2022.emnlp-demos.42
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:
December
Year:
2022
Address:
Abu Dhabi, UAE
Editors:
Wanxiang Che, Ekaterina Shutova
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
435–444
Language:
URL:
https://aclanthology.org/2022.emnlp-demos.42
DOI:
10.18653/v1/2022.emnlp-demos.42
Bibkey:
Cite (ACL):
Tianyi Tang, Junyi Li, Zhipeng Chen, Yiwen Hu, Zhuohao Yu, Wenxun Dai, Wayne Xin Zhao, Jian-yun Nie, and Ji-rong Wen. 2022. TextBox 2.0: A Text Generation Library with Pre-trained Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 435–444, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
TextBox 2.0: A Text Generation Library with Pre-trained Language Models (Tang et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-demos.42.pdf