ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer Huadai Liu author Rongjie Huang author Xuan Lin author Wenqiang Xu author Maozong Zheng author Hong Chen author Jinzheng He author Zhou Zhao author 2023-12 text Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing Houda Bouamor editor Juan Pino editor Kalika Bali editor Association for Computational Linguistics Singapore conference publication liu-etal-2023-vit 10.18653/v1/2023.emnlp-main.990 https://aclanthology.org/2023.emnlp-main.990/ 2023-12 15957 15969