Zhichang Wang


2024

pdf bib
Zero-Shot Spoken Language Understanding via Large Language Models: A Preliminary Study
Zhihong Zhu | Xuxin Cheng | Hao An | Zhichang Wang | Dongsheng Chen | Zhiqi Huang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Zero-shot Spoken Language Understanding (SLU) aims to enable task-oriented dialogue systems to understand user needs without training data. Challenging but worthwhile, zero-shot SLU reduces the time and effort that data labeling takes. Recent advancements in large language models (LLMs), such as GPT3.5 and ChatGPT, have shown promising results in zero-shot settings, which motivates us to explore prompt-based methods. In this study, we investigate whether strong SLU models can be constructed by directly prompting LLMs. Specifically, we propose a simple yet effective two-stage framework dubbed GPT-SLU, which transforms the SLU task into a question-answering problem. Powered by multi-stage mutual guided prompts, GPT-SLU can leverage the correlations between two subtasks in SLU to achieve better predictions, which is greatly explored in the traditional fine-tuning paradigm. Experimental results on three SLU benchmark datasets demonstrate the significant potential of LLMs for zero-shot SLU. Comprehensive analyses validate the effectiveness of our proposed framework and also indicate that there is still room for further improvement of LLMs in SLU scenarios.

pdf bib
MaCSC: Towards Multimodal-augmented Pre-trained Language Models via Conceptual Prototypes and Self-balancing Calibration
Xianwei Zhuang | Zhichang Wang | Xuxin Cheng | Yuxin Xie | Liming Liang | Yuexian Zou
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Pre-trained language models (PLMs) that rely solely on textual data may exhibit limitations in multimodal semantics comprehension. Existing solutions attempt to alleviate this issue by incorporating explicit image retrieval or generation techniques.However, these methods: (1) focus exclusively on the static image modality; (2) inevitably encounter modality gaps and noise; (3) indiscriminately treat all modalities.In this paper, we propose a novel multimodal-augmented framework termed MaCSC, which can infuse multimodal semantics into PLMs and facilitate a self-balancing calibration of information allocation.Specifically, MaCSC obtains modal-specific conceptual prototypes from contrastive pre-training models (e.g., CLIP),and aggregates the intra- and inter-modal semantics of the conceptual prototype to enhance PLMs.In addition, we utilize a novel self-balancing contrastive loss to achieve multi-scale self-balancing calibration of multimodal information during fine-tuning PLMs.Experimental results show that MaCSC consistently improves the performance of PLMs across various architectures and scales, and outperforms competitive baselines on multiple NLP tasks.