MelTrim: Coarse-to-Fine Data Pruning for Speech Classification

Shaobo Wang; Tianle Niu; Xuan Ouyang; Xintong Li; Zhengkun Ge; Yue Min; Xiaoqian Liu; Hankun Wang; Linfeng Zhang

MelTrim: Coarse-to-Fine Data Pruning for Speech Classification

Shaobo Wang, Tianle Niu, Xuan Ouyang, Xintong Li, Zhengkun Ge, Yue Min, Xiaoqian Liu, Hankun Wang, Linfeng Zhang

Abstract

Dataset Pruning (DP) aims to construct a coreset that achieves performance comparable to the original, full dataset. However, few studies have explored DP in the context of Speech Classification (SC) tasks. Unlike image or text classification, SC is particularly challenging due to the difficulty in capturing the acoustic, semantic, and contextual representations. In this study, we propose a novel dataset pruning method for speech datasets, termed Meltrim, which uses a two-step coarse-to-fine framework designed to address these challenges. Specifically, in Step 1, Meltrim coarsely filters utterance-level redundant samples using DBSCAN clustering on Mel-Frequency Cepstral Coefficients (MFCC) features, which are first flattened and then reduced in dimensionality using UMAP. In Step 2, we perform frame-level redundancy pruning for each utterance via utility pruning, which aims to eliminate irrelevant frames within each utterance. To the best of our knowledge, this is the first dataset pruning approach designed for Speech Classification tasks, demonstrating outstanding performance compared to classical general DP methods. Notably, for the Speech Emotion Recognition, our method achieves up to a 49.5% improvement in WA (Weighted Accuracy) on the MEAD dataset. For the Speaker Identification tasks, it results in a 41.9% reduction in EER (Equal Error Rate) on the VoxCeleb1 dataset.

Anthology ID:: 2026.findings-acl.672
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13751–13765
Language:
URL:: https://aclanthology.org/2026.findings-acl.672/
DOI:
Bibkey:
Cite (ACL):: Shaobo Wang, Tianle Niu, Xuan Ouyang, Xintong Li, Zhengkun Ge, Yue Min, Xiaoqian Liu, Hankun Wang, and Linfeng Zhang. 2026. MelTrim: Coarse-to-Fine Data Pruning for Speech Classification. In Findings of the Association for Computational Linguistics: ACL 2026, pages 13751–13765, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: MelTrim: Coarse-to-Fine Data Pruning for Speech Classification (Wang et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.672.pdf
Checklist:: 2026.findings-acl.672.checklist.pdf

PDF Cite Search Checklist Fix data