融合外部语言知识的流式越南语语音识别(Streaming Vietnamese Speech Recognition Based on Fusing External Vietnamese Language Knowledge)

Junqiang Wang (王俊强), Zhengtao Yu (余正涛), Ling Dong (董凌), Shengxiang Gao (高盛祥), Wenjun Wang (王文君)


Abstract
“越南语为低资源语言,训练语料难以获取;流式端到端模型在训练过程中难以学习到外部大量文本中的语言知识,这些问题在一定程度上都限制了流式越南语语音识别模型的性能。因此,本文以越南语音节作为语言模型和流式越南语语音识别模型的建模单元,提出了一种将预训练越南语语言模型在训练阶段融合到流式语音识别模型的方法。在训练阶段,通过最小化预训练越南语语言模型和解码器的输出计算一个新的损失函数LAE D−LM ,帮助流式越南语语音识别模型学习一些越南语语言知识从而优化其模型参数;在解码阶段,使用孓孨孡孬孬孯孷 孆孵孳孩孯孮或者字孆孓孔技术再次融合预训练语言模型进一步提升模型识别率。实验结果表明,在孖孉孖孏孓数据集上,相比基线模型,在训练阶段融合语言模型可以将流式越南语语音识别模型的词错率提升嬲嬮嬴嬵嬥;在解码阶段使用孓孨孡孬孬孯孷 孆孵孳孩孯孮或字孆孓孔再次融合语言模型,还可以将模型词错率分别提升嬱嬮嬳嬵嬥和嬴嬮嬷嬵嬥。”
Anthology ID:
2022.ccl-1.53
Volume:
Proceedings of the 21st Chinese National Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Nanchang, China
Editors:
Maosong Sun (孙茂松), Yang Liu (刘洋), Wanxiang Che (车万翔), Yang Feng (冯洋), Xipeng Qiu (邱锡鹏), Gaoqi Rao (饶高琦), Yubo Chen (陈玉博)
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
591–599
Language:
Chinese
URL:
https://aclanthology.org/2022.ccl-1.53
DOI:
Bibkey:
Cite (ACL):
Junqiang Wang, Zhengtao Yu, Ling Dong, Shengxiang Gao, and Wenjun Wang. 2022. 融合外部语言知识的流式越南语语音识别(Streaming Vietnamese Speech Recognition Based on Fusing External Vietnamese Language Knowledge). In Proceedings of the 21st Chinese National Conference on Computational Linguistics, pages 591–599, Nanchang, China. Chinese Information Processing Society of China.
Cite (Informal):
融合外部语言知识的流式越南语语音识别(Streaming Vietnamese Speech Recognition Based on Fusing External Vietnamese Language Knowledge) (Wang et al., CCL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.ccl-1.53.pdf