Shang-Tse Chen
2024
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition
Hsuan Su
|
Hua Farn
|
Fan-Yun Sun
|
Shang-Tse Chen
|
Hung-yi Lee
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Synthetic data is widely used in speech recognition due to the availability of text-to-speech models, which facilitate adapting models to previously unseen text domains. However, existing methods suffer in performance when they fine-tune an automatic speech recognition (ASR) model on synthetic data as they suffer from the distributional shift commonly referred to as the synthetic-to-real gap. In this paper, we find that task arithmetic is effective at mitigating this gap. Our proposed method, SYN2REAL task vector, shows an average improvement of 10.03% improvement in word error rate over baselines on the SLURP dataset. Additionally, we show that an average of SYN2REAL task vectors, when we have real speeches from multiple different domains, can further adapt the original ASR model to perform better on the target text domain.
2023
Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue
Hsuan Su
|
Shachi H. Kumar
|
Sahisnu Mazumder
|
Wenda Chen
|
Ramesh Manuvinakurike
|
Eda Okur
|
Saurav Sahay
|
Lama Nachman
|
Shang-Tse Chen
|
Hung-yi Lee
Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering
With the power of large pretrained language models, various research works have integrated knowledge into dialogue systems. The traditional techniques treat knowledge as part of the input sequence for the dialogue system, prepending a set of knowledge statements in front of dialogue history. However, such a mechanism forces knowledge sets to be concatenated in an ordered manner, making models implicitly pay imbalanced attention to the sets during training. In this paper, we first investigate how the order of the knowledge set can influence autoregressive dialogue systems’ responses. We conduct experiments on two commonly used dialogue datasets with two types of transformer-based models and find that models view the input knowledge unequally. To this end, we propose a simple and novel technique to alleviate the order effect by modifying the position embeddings of knowledge input in these models. With the proposed position embedding method, the experimental results show that each knowledge statement is uniformly considered to generate responses.