PromptST: Abstract Prompt Learning for End-to-End Speech Translation

Tengfei Yu, Liang Ding, Xuebo Liu, Kehai Chen, Meishan Zhang, Dacheng Tao, Min Zhang


Abstract
An end-to-end speech-to-text (S2T) translation model is usually initialized from a pre-trained speech recognition encoder and a pre-trained text-to-text (T2T) translation decoder. Although this straightforward setting has been shown empirically successful, there do not exist clear answers to the research questions: 1) how are speech and text modalities fused in S2T model and 2) how to better fuse the two modalities? In this paper, we take the first step toward understanding the fusion of speech and text features in S2T model. We first design and release a 10GB linguistic probing benchmark, namely Speech-Senteval, to investigate the acoustic and linguistic behaviors of S2T models. Preliminary analysis reveals that the uppermost encoder layers of the S2T model can not learn linguistic knowledge efficiently, which is crucial for accurate translation. Based on the finding, we further propose a simple plug-in prompt-learning strategy on the uppermost encoder layers to broaden the abstract representation power of the encoder of S2T models. We call such a prompt-enhanced S2T model PromptST. Experimental results on four widely-used S2T datasets show that PromptST can deliver significant improvements over a strong baseline by capturing richer linguistic knowledge. Benchmarks, code, and scripts are freely available at https://github.com/ytf-philp/PromptST.
Anthology ID:
2023.emnlp-main.627
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10140–10154
Language:
URL:
https://aclanthology.org/2023.emnlp-main.627
DOI:
10.18653/v1/2023.emnlp-main.627
Bibkey:
Cite (ACL):
Tengfei Yu, Liang Ding, Xuebo Liu, Kehai Chen, Meishan Zhang, Dacheng Tao, and Min Zhang. 2023. PromptST: Abstract Prompt Learning for End-to-End Speech Translation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10140–10154, Singapore. Association for Computational Linguistics.
Cite (Informal):
PromptST: Abstract Prompt Learning for End-to-End Speech Translation (Yu et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.627.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.627.mp4