Contrastive and Consistency Learning for Neural Noisy-Channel Model in Spoken Language Understanding

Suyoung Kim, Jiyeon Hwang, Ho-Young Jung


Abstract
Recently, deep end-to-end learning has been studied for intent classification in Spoken Language Understanding (SLU). However, end-to-end models require a large amount of speech data with intent labels, and highly optimized models are generally sensitive to the inconsistency between the training and evaluation conditions. Therefore, a natural language understanding approach based on Automatic Speech Recognition (ASR) remains attractive because it can utilize a pre-trained general language model and adapt to the mismatch of the speech input environment. Using this module-based approach, we improve a noisy-channel model to handle transcription inconsistencies caused by ASR errors. We propose a two-stage method, Contrastive and Consistency Learning (CCL), that correlates error patterns between clean and noisy ASR transcripts and emphasizes the consistency of the latent features of the two transcripts. Experiments on four benchmark datasets show that CCL outperforms existing methods and improves the ASR robustness in various noisy environments. Code is available at https://github.com/syoung7388/CCL
Anthology ID:
2024.naacl-long.318
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5698–5711
Language:
URL:
https://aclanthology.org/2024.naacl-long.318
DOI:
10.18653/v1/2024.naacl-long.318
Bibkey:
Cite (ACL):
Suyoung Kim, Jiyeon Hwang, and Ho-Young Jung. 2024. Contrastive and Consistency Learning for Neural Noisy-Channel Model in Spoken Language Understanding. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5698–5711, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Contrastive and Consistency Learning for Neural Noisy-Channel Model in Spoken Language Understanding (Kim et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.318.pdf