Lijinlong Lijinlong

2025

Self-Supervised Contrastive Learning for Content-Centric Speech Representation
Lijinlong Lijinlong | Ling Dong | Wenjun Wang | Zhengtao Yu | Shengxiang Gao
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"Self-supervised learning (SSL) speech models have achieved remarkable performance across various tasks, with the learned representations often exhibiting a high degree of generality and applicability to multiple downstream tasks. However, these representations contain both speech content and some paralinguistic information, which may be redundant for content-focused tasks.Decoupling this redundant information is challenging. To address this issue, we propose a Self-Supervised Contrastive Representation Learning method (SSCRL), which effectively disentangles paralinguistic information from speech content by aligning similar content speech representations in the feature space using self-supervised contrastive learning with pitch perturbation and speaker perturbation features. Experimental results demonstrate that the proposed method, when fine-tuned on the LibriSpeech 100-hour dataset, achieves superior performance across all content-related tasks in the SUPERB Benchmark, generally outperforming prior approaches."

Co-authors

Venues

CCL1

Fix author