基于预训练模型与序列建模的音素分割方法(Sequence Modeling)

Yang Shanglong (杨尚龙), Yu Zhengtao (余正涛), Wang Wenjun (王文君), Dong Ling (董凌), Gao Shengxiang (高盛祥)


Abstract
“音素分割作为语音处理领域内的一个重要任务,对于关键词识别、自动语音识别等应用具有至关重要的意义。传统方法往往独立预测每一帧音频是否为音素边界,忽视了音素边界与整个音频序列以及相邻帧之间的内在联系,从而影响了分割的准确性和连贯性。本文提出一种基于预训练模型与序列建模的音素分割方法,在HuBERT模型提取声学特征的基础上,结合BiLSTM捕捉长期依赖,再用CRF优化序列,提升了音素边界检测的性能。在TIMIT和Buckeye数据集上的实验表明,本文方法优于现有技术,证明了序列建模在音素分割任务中的有效性。”
Anthology ID:
2024.ccl-1.49
Volume:
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
Month:
July
Year:
2024
Address:
Taiyuan, China
Editors:
Maosong Sun, Jiye Liang, Xianpei Han, Zhiyuan Liu, Yulan He
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
625–636
Language:
Chinese
URL:
https://aclanthology.org/2024.ccl-1.49/
DOI:
Bibkey:
Cite (ACL):
Yang Shanglong, Yu Zhengtao, Wang Wenjun, Dong Ling, and Gao Shengxiang. 2024. 基于预训练模型与序列建模的音素分割方法(Sequence Modeling). In Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference), pages 625–636, Taiyuan, China. Chinese Information Processing Society of China.
Cite (Informal):
基于预训练模型与序列建模的音素分割方法(Sequence Modeling) (Shanglong et al., CCL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ccl-1.49.pdf