The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation

Mutian He, Philip Garner


Abstract
End-to-end spoken language understanding (SLU) remains elusive even with current large pretrained language models on text and speech, especially in multilingual cases. Machine translation has been established as a powerful pretraining objective on text as it enables the model to capture high-level semantics of the input utterance and associations between different languages, which is desired for speech models that work on lower-level acoustic frames. Motivated particularly by the task of cross-lingual SLU, we demonstrate that the task of speech translation (ST) is a good means of pretraining speech models for end-to-end SLU on both intra- and cross-lingual scenarios. By introducing ST, our models reach higher performance over baselines on monolingual and multilingual intent classification as well as spoken question answering using SLURP, MINDS-14, and NMSQA benchmarks. To verify the effectiveness of our methods, we also create new benchmark datasets from both synthetic and real sources, for speech summarization and low-resource/zero-shot transfer from English to French or Spanish. We further show the value of preserving knowledge for the ST pretraining task for better downstream performance, possibly using Bayesian transfer regularizers.
Anthology ID:
2023.findings-emnlp.291
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4408–4423
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.291
DOI:
10.18653/v1/2023.findings-emnlp.291
Bibkey:
Cite (ACL):
Mutian He and Philip Garner. 2023. The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 4408–4423, Singapore. Association for Computational Linguistics.
Cite (Informal):
The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation (He & Garner, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.291.pdf