基于字节对编码的端到端藏语语音识别研究(End-to-End Tibetan Speech Recognition Study Based on Byte Pair Coding)

Cai Yuqing (蔡郁青), Wang Chao (王超), Renzeng Duojie (仁增多杰), Zhu Yulei (朱宇雷), Zhang Jin (张瑾), Nyima Tashi (尼玛扎西)


Abstract
“针对藏语端到端语音识别研究中存在的建模单元不统一和识别效果不理想的问题,本文提出了一种BPE-Conformer-CTC/Attention端到端藏语语音识别方法。首先,该方法采用了字节对编码算法进行语音建模,通过反复合并出现频率最高的字符对,将文本分割成易于管理、有意义的单元,平衡建模单元的粒度,从而解决藏语语音识别中建模单元不统一的问题。其 次 , 使 用 了Conformer编码器 , 有效地融合了音频序列的全局和局部依赖关系,从而增强了模型的表征能力。最后,通过CTC/Attention联合解码策略,加速了对齐和解码过程,进而提高了识别效果的准确性和效率。在开源数据集XBMU-AMDO31和TIBMD@MUCI上的实验结果表明,所提出的BPE-Conformer-CTC/Attention模型分别取得了9.0%和4.6%的词错误率,相较于基线模型Transformer-CTC/Attention,词错误率分别相对降低了14.2%和30.3%。该研究方法为藏语端到端语音识别任务提供了一种有效的解决方案。”
Anthology ID:
2024.ccl-1.23
Volume:
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
Month:
July
Year:
2024
Address:
Taiyuan, China
Editors:
Maosong Sun, Jiye Liang, Xianpei Han, Zhiyuan Liu, Yulan He
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
305–313
Language:
Chinese
URL:
https://aclanthology.org/2024.ccl-1.23/
DOI:
Bibkey:
Cite (ACL):
Cai Yuqing, Wang Chao, Renzeng Duojie, Zhu Yulei, Zhang Jin, and Nyima Tashi. 2024. 基于字节对编码的端到端藏语语音识别研究(End-to-End Tibetan Speech Recognition Study Based on Byte Pair Coding). In Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference), pages 305–313, Taiyuan, China. Chinese Information Processing Society of China.
Cite (Informal):
基于字节对编码的端到端藏语语音识别研究(End-to-End Tibetan Speech Recognition Study Based on Byte Pair Coding) (Yuqing et al., CCL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ccl-1.23.pdf