基于注意力的蒙古语说话人特征提取方法(Attention based Mongolian Speaker Feature Extraction)

Fangyuan Zhu (朱方圆), Zhiqiang Ma (马志强), Zhiqiang Liu (刘志强), Caijilahu Bao (宝财吉拉呼), Hongbin Wang (王洪彬)


Abstract
“说话人特征提取模型提取到的说话人特征之间区分性低,使得蒙古语声学模型无法学习到区分性信息,导致模型无法适应不同说话人。提出一种基于注意力的说话人自适应方法,方法引入神经图灵机进行自适应,增加记忆模块存放说话人特征,采用注意力机制计算记忆模块中说话人特征与当前语音说话人特征的相似权重矩阵,通过权重矩阵重新组合成说话人特征s-vector,进而提高说话人特征之间的区分性。在IMUT-MCT数据集上,进行说话人特征提取方法的消融实验、模型自适应实验和案例分析。实验结果表明,对比不同说话人特征s-vector、i-vector与d-vector,s-vector比其他两种方法的SER和WER分别降低4.96%、1.08%;在不同的蒙古语声学模型上进行比较,提出的方法相对于基线均有性能提升。”
Anthology ID:
2022.ccl-1.31
Volume:
Proceedings of the 21st Chinese National Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Nanchang, China
Editors:
Maosong Sun (孙茂松), Yang Liu (刘洋), Wanxiang Che (车万翔), Yang Feng (冯洋), Xipeng Qiu (邱锡鹏), Gaoqi Rao (饶高琦), Yubo Chen (陈玉博)
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
344–354
Language:
Chinese
URL:
https://aclanthology.org/2022.ccl-1.31
DOI:
Bibkey:
Cite (ACL):
Fangyuan Zhu, Zhiqiang Ma, Zhiqiang Liu, Caijilahu Bao, and Hongbin Wang. 2022. 基于注意力的蒙古语说话人特征提取方法(Attention based Mongolian Speaker Feature Extraction). In Proceedings of the 21st Chinese National Conference on Computational Linguistics, pages 344–354, Nanchang, China. Chinese Information Processing Society of China.
Cite (Informal):
基于注意力的蒙古语说话人特征提取方法(Attention based Mongolian Speaker Feature Extraction) (Zhu et al., CCL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.ccl-1.31.pdf