Exploiting the compressed spectral loss for the learning of the DEMUCS speech enhancement network

Chi-En Dai, Qi-Wei Hong, Jeih-Weih Hung


Abstract
This study aims to improve a highly effective speech enhancement technique, DEMUCS, by revising the respective loss function in learning. DEMUCS, developed by Facebook Team, is built on the Wave-UNet and consists of convolutional layer encoding and decoding blocks with an LSTM layer in between. Although DEMUCS processes the input speech utterance purely in the time (wave) domain, the applied loss function consists of wave-domain L1 distance and multi-scale shorttime-Fourier-transform (STFT) loss. That is, both time- and frequency-domain features are taken into consideration in the learning of DEMUCS. In this study, we present revising the STFT loss in DEMUCS by employing the compressed magnitude spectrogram. The compression is done by either the power-law operation with a positive exponent less than one, or the logarithmic operation. We evaluate the presented novel framework on the VoiceBank-DEMAND database and task. The preliminary experimental results suggest that DEMUCS containing the power-law compressed magnitude spectral loss outperforms the original DEMUCS by providing the test utterances with higher objective quality and intelligibility scores (PESQ and STOI). Relatively, the logarithm compressed magnitude spectral loss does not benefit DEMUCS. Therefore, we reveal that DEMUCS can be further improved by properly revising the STFT terms of its loss function.
Anthology ID:
2022.rocling-1.13
Volume:
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
Month:
November
Year:
2022
Address:
Taipei, Taiwan
Editors:
Yung-Chun Chang, Yi-Chin Huang
Venue:
ROCLING
SIG:
Publisher:
The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Note:
Pages:
100–106
Language:
Chinese
URL:
https://aclanthology.org/2022.rocling-1.13
DOI:
Bibkey:
Cite (ACL):
Chi-En Dai, Qi-Wei Hong, and Jeih-Weih Hung. 2022. Exploiting the compressed spectral loss for the learning of the DEMUCS speech enhancement network. In Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), pages 100–106, Taipei, Taiwan. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
Cite (Informal):
Exploiting the compressed spectral loss for the learning of the DEMUCS speech enhancement network (Dai et al., ROCLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.rocling-1.13.pdf