PALM: Few-Shot Prompt Learning for Audio Language Models

Asif Hanif; Maha Tufail Agro; Mohammad Areeb Qazi; Hanan Aldarmaki

doi:10.18653/v1/2024.emnlp-main.1030

PALM: Few-Shot Prompt Learning for Audio Language Models

Asif Hanif, Maha Tufail Agro, Mohammad Areeb Qazi, Hanan Aldarmaki

Abstract

Audio-Language Models (ALMs) have recently achieved remarkable success in zero-shot audio recognition tasks, which match features of audio waveforms with class-specific text prompt features, inspired by advancements in Vision-Language Models (VLMs). Given the sensitivity of zero-shot performance to the choice of hand-crafted text prompts, many prompt learning techniques have been developed for VLMs. We explore the efficacy of these approaches in ALMs and propose a novel method, Prompt Learning in Audio Language Models (PALM), which optimizes the feature space of the text encoder branch. Unlike existing methods that work in the input space, our approach results in greater training efficiency. We demonstrate the effectiveness of our approach on 11 audio recognition datasets, encompassing a variety of speech-processing tasks, and compare the results with three baselines in a few-shot learning setup. Our method is either on par with or outperforms other approaches while being computationally less demanding. Our code is publicly available at https://asif-hanif.github.io/palm/.

Anthology ID:: 2024.emnlp-main.1030
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18527–18536
Language:
URL:: https://aclanthology.org/2024.emnlp-main.1030/
DOI:: 10.18653/v1/2024.emnlp-main.1030
Bibkey:
Cite (ACL):: Asif Hanif, Maha Tufail Agro, Mohammad Areeb Qazi, and Hanan Aldarmaki. 2024. PALM: Few-Shot Prompt Learning for Audio Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 18527–18536, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: PALM: Few-Shot Prompt Learning for Audio Language Models (Hanif et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.1030.pdf

PDF Cite Search Fix data