Shuyan Dong


2023

pdf bib
Introducing Semantics into Speech Encoders
Derek Xu | Shuyan Dong | Changhan Wang | Suyoun Kim | Zhaojiang Lin | Bing Liu | Akshat Shrivastava | Shang-Wen Li | Liang-Hsuan Tseng | Guan-Ting Lin | Alexei Baevski | Hung-yi Lee | Yizhou Sun | Wei Wang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information. As a result, pipelined supervised automatic speech recognition (ASR) to large language model (LLM) systems achieve state-of-the-art results on semantic spoken language tasks by utilizing rich semantic representations from the LLM. These systems come at the cost of labeled audio transcriptions, which is expensive and time-consuming to obtain. We propose a task-agnostic unsupervised way of incorporating semantic information from LLMs into self-supervised speech encoders without labeled audio transcriptions. By introducing semantics, we improve existing speech encoder spoken language understanding (SLU) performance by over 5% on intent classification (IC), with modest gains in named entity resolution (NER) and slot filling (SF), and spoken question answering (SQA) FF1 score by over 2%. Our approach, which uses no ASR data, achieves similar performance as methods trained on over 100 hours of labeled audio transcripts, demonstrating the feasibility of unsupervised semantic augmentations to existing speech encoders.

2022

pdf bib
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Hsiang-Sheng Tsai | Heng-Jui Chang | Wen-Chin Huang | Zili Huang | Kushal Lakhotia | Shu-wen Yang | Shuyan Dong | Andy Liu | Cheng-I Lai | Jiatong Shi | Xuankai Chang | Phil Hall | Hsuan-Jui Chen | Shang-Wen Li | Shinji Watanabe | Abdelrahman Mohamed | Hung-yi Lee
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards introducing a common benchmark to evaluate pre-trained models across various speech tasks. In this paper, we introduce SUPERB-SG, a new benchmark focusing on evaluating the semantic and generative capabilities of pre-trained models by increasing task diversity and difficulty over SUPERB. We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain and quality across different types of tasks. It entails freezing pre-trained model parameters, only using simple task-specific trainable heads. The goal is to be inclusive of all researchers, and encourage efficient use of computational resources. We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.

2021

pdf bib
Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing
Hung-Yi Lee | Mitra Mohtarami | Shang-Wen Li | Di Jin | Mandy Korpusik | Shuyan Dong | Ngoc Thang Vu | Dilek Hakkani-Tur
Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing

pdf bib
Optimizing NLU Reranking Using Entity Resolution Signals in Multi-domain Dialog Systems
Tong Wang | Jiangning Chen | Mohsen Malmir | Shuyan Dong | Xin He | Han Wang | Chengwei Su | Yue Liu | Yang Liu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

In dialog systems, the Natural Language Understanding (NLU) component typically makes the interpretation decision (including domain, intent and slots) for an utterance before the mentioned entities are resolved. This may result in intent classification and slot tagging errors. In this work, we propose to leverage Entity Resolution (ER) features in NLU reranking and introduce a novel loss term based on ER signals to better learn model weights in the reranking framework. In addition, for a multi-domain dialog scenario, we propose a score distribution matching method to ensure scores generated by the NLU reranking models for different domains are properly calibrated. In offline experiments, we demonstrate our proposed approach significantly outperforms the baseline model on both single-domain and cross-domain evaluations.