基于字节对编码的端到端藏语语音识别研究(End-to-End Tibetan Speech Recognition Study Based on Byte Pair Coding)

Yuqing Cai; Chao Wang (王超); Duojie Renzeng; Yulei Zhu; Jin Zhang (张瑾); Tashi Nyima

基于字节对编码的端到端藏语语音识别研究(End-to-End Tibetan Speech Recognition Study Based on Byte Pair Coding)

Yuqing Cai (蔡郁青), Chao Wang (王超), Duojie Renzeng (仁增多杰), Yulei Zhu (朱宇雷), Jin Zhang (张瑾), Tashi Nyima (尼玛扎西)

Abstract

“针对藏语端到端语音识别研究中存在的建模单元不统一和识别效果不理想的问题,本文提出了一种BPE-Conformer-CTC/Attention端到端藏语语音识别方法。首先,该方法采用了字节对编码算法进行语音建模,通过反复合并出现频率最高的字符对,将文本分割成易于管理、有意义的单元,平衡建模单元的粒度,从而解决藏语语音识别中建模单元不统一的问题。其次 , 使用了Conformer编码器 , 有效地融合了音频序列的全局和局部依赖关系,从而增强了模型的表征能力。最后,通过CTC/Attention联合解码策略,加速了对齐和解码过程,进而提高了识别效果的准确性和效率。在开源数据集XBMU-AMDO31和TIBMD@MUCI上的实验结果表明,所提出的BPE-Conformer-CTC/Attention模型分别取得了9.0%和4.6%的词错误率,相较于基线模型Transformer-CTC/Attention,词错误率分别相对降低了14.2%和30.3%。该研究方法为藏语端到端语音识别任务提供了一种有效的解决方案。”

Anthology ID:: 2024.ccl-1.23
Volume:: Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
Month:: July
Year:: 2024
Address:: Taiyuan, China
Editors:: Sun Maosong, Liang Jiye, Han Xianpei, Liu Zhiyuan, He Yulan
Venue:: CCL
SIG:
Publisher:: Chinese Information Processing Society of China
Note:
Pages:: 305–313
Language:: Chinese
URL:: https://aclanthology.org/2024.ccl-1.23/
DOI:
Bibkey:
Cite (ACL):: Yuqing Cai, Chao Wang, Duojie Renzeng, Yulei Zhu, Jin Zhang, and Tashi Nyima. 2024. 基于字节对编码的端到端藏语语音识别研究(End-to-End Tibetan Speech Recognition Study Based on Byte Pair Coding). In Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference), pages 305–313, Taiyuan, China. Chinese Information Processing Society of China.
Cite (Informal):: 基于字节对编码的端到端藏语语音识别研究(End-to-End Tibetan Speech Recognition Study Based on Byte Pair Coding) (Cai et al., CCL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.ccl-1.23.pdf

PDF Cite Search Fix data