Speech-Aware Multi-Domain Dialogue State Generation with ASR Error Correction Modules

Ridong Jiang, Wei Shi, Bin Wang, Chen Zhang, Yan Zhang, Chunlei Pan, Jung Jae Kim, Haizhou Li


Abstract
Prior research on dialogue state tracking (DST) is mostly based on written dialogue corpora. For spoken dialogues, the DST model trained on the written text should use the results (or hypothesis) of automatic speech recognition (ASR) as input. But ASR hypothesis often includes errors, which leads to significant performance drop for spoken dialogue state tracking. We address the issue by developing the following ASR error correction modules. First, we train a model to convert ASR hypothesis to ground truth user utterance, which can fix frequent patterns of errors. The model takes ASR hypotheses of two ASR models as input and fine-tuned in two stages. The corrected hypothesis is fed into a large scale pre-trained encoder-decoder model (T5) for DST training and inference. Second, if an output slot value from the encoder-decoder model is a name, we compare it with names in a dictionary crawled from Web sites and, if feasible, replace with the crawled name of the shortest edit distance. Third, we fix errors of temporal expressions in ASR hypothesis by using hand-crafted rules. Experiment results on the DSTC 11 speech-aware dataset, which is built on the popular MultiWOZ task (version 2.1), show that our proposed method can effectively mitigate the performance drop when moving from written text to spoken conversations.
Anthology ID:
2023.dstc-1.13
Volume:
Proceedings of The Eleventh Dialog System Technology Challenge
Month:
September
Year:
2023
Address:
Prague, Czech Republic
Editors:
Yun-Nung Chen, Paul Crook, Michel Galley, Sarik Ghazarian, Chulaka Gunasekara, Raghav Gupta, Behnam Hedayatnia, Satwik Kottur, Seungwhan Moon, Chen Zhang
Venues:
DSTC | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
105–112
Language:
URL:
https://aclanthology.org/2023.dstc-1.13
DOI:
Bibkey:
Cite (ACL):
Ridong Jiang, Wei Shi, Bin Wang, Chen Zhang, Yan Zhang, Chunlei Pan, Jung Jae Kim, and Haizhou Li. 2023. Speech-Aware Multi-Domain Dialogue State Generation with ASR Error Correction Modules. In Proceedings of The Eleventh Dialog System Technology Challenge, pages 105–112, Prague, Czech Republic. Association for Computational Linguistics.
Cite (Informal):
Speech-Aware Multi-Domain Dialogue State Generation with ASR Error Correction Modules (Jiang et al., DSTC-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.dstc-1.13.pdf