噪声鲁棒的蒙古语语音数据增广模型结构(Noise robust Mongolian speech data augmentation model structure)

Zhiqaing Ma (马志强), Jiaqi Sun (孙佳琦), Jinyi Li (李晋益), Jiatai Wang (王嘉泰)


Abstract
“蒙古语语料库中语音多样性匮乏,虽然花费人力和经费收集数据在一定程度上能够增加语音的数量,但整个过程需要耗费大量的时间。数据增广能够解决这种数据匮乏问题,但数据增广模型的训练数据包含的环境噪声无法控制,导致增广语音中存在背景噪声。本文提出一种TTS和语音增强相结合的语音数据增广方法,以语音的频谱图为基础,从频域和时域两个维度进行语音增强。通过多组实验证明,蒙古语增广语音的合格率达到70%,增广语音的CBAK和COVL分别下降了0.66和0.81,WER和SER下降了2.75%和2.05%。”
Anthology ID:
2023.ccl-1.14
Volume:
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
Month:
August
Year:
2023
Address:
Harbin, China
Editors:
Maosong Sun, Bing Qin, Xipeng Qiu, Jing Jiang, Xianpei Han
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
155–165
Language:
Chinese
URL:
https://aclanthology.org/2023.ccl-1.14
DOI:
Bibkey:
Cite (ACL):
Zhiqaing Ma, Jiaqi Sun, Jinyi Li, and Jiatai Wang. 2023. 噪声鲁棒的蒙古语语音数据增广模型结构(Noise robust Mongolian speech data augmentation model structure). In Proceedings of the 22nd Chinese National Conference on Computational Linguistics, pages 155–165, Harbin, China. Chinese Information Processing Society of China.
Cite (Informal):
噪声鲁棒的蒙古语语音数据增广模型结构(Noise robust Mongolian speech data augmentation model structure) (Ma et al., CCL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ccl-1.14.pdf