2024
pdf
bib
abs
基于神经编解码语言模型的老挝语韵律建模方法(A Method for Lao Prosody Modeling Based on Neural Codec Language Model)
Yi Ningjing (易宁静)
|
Wang Linqin (王琳钦)
|
Gao Shengxiang (高盛祥)
|
Yu Zhengtao (余正涛)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“为了赋予合成语音类似人类语言的丰富韵律和节奏变化,现有方法普遍采用基于随机数的时长预测器。这些方法通过使用随机数初始化的潜在变量来模拟人类说话的多样节奏变化。然而,由于依赖于随机数噪声的局限性,这些方法合成的语音往往仍然缺乏真实语音的多样性和韵律变化的丰富性。与之前方法不同,本文提出了一种基于神经编解码语言模型(VALL-E)的韵律建模方法,本文利用先验速度和音调时序变化曲线建模韵律变化分布,有效融入神经编解码语言模型训练过程中,并且在推理阶段可通过控制先验时序曲线控制生成语音的韵律。实验证明,本文方法合成英语音频达到了4.05的MOS评分,合成老挝语音频达到了3.61的MOS评分。基于神经编解码语言模型的老挝语韵律建模方法,能很好的在速度和音调方面实现韵律的可控性。”
pdf
bib
abs
基于联邦知识蒸馏的跨语言社交媒体事件检测(Cross-Lingual Social Event Detection Based on Federated Knowledge Distillation)
Zhou Shuaishuai (周帅帅)
|
Zhu Enchang (朱恩昌)
|
Gao Shengxiang (高盛祥)
|
Yu Zhengtao (余正涛)
|
Xian Yantuan (线岩团)
|
Zhao Zixiao (赵子霄)
|
Chen Lin (陈霖)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“社交媒体事件检测是指在从各类社交媒体的内容中挖掘热点事件。在实际情况中,由于数据稀缺,社交媒体事件检测在低资源的情况下表现较差。现有的方法主要通过跨语言知识迁移等方式来缓解低资源问题,但忽略了数据隐私问题。因此,本文提出了基于联邦知识蒸馏的跨语言社交媒体事件检测框架(FedEvent),旨在将富资源客户端知识蒸馏到低资源客户端。该框架通过结合参数高效微调技术和三组对比损失,实现非英文语义空间到英文语义空间的有效映射,并采用联邦蒸馏策略,保障数据隐私的前提下实现知识的迁移。此外,我们还设计了一套四阶段生命周期机制以适应增量场景。最后,我们在真实数据集上进行实验以证明该框架的有效性。”
pdf
bib
abs
基于预训练模型与序列建模的音素分割方法(Sequence Modeling)
Yang Shanglong (杨尚龙)
|
Yu Zhengtao (余正涛)
|
Wang Wenjun (王文君)
|
Dong Ling (董凌)
|
Gao Shengxiang (高盛祥)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“音素分割作为语音处理领域内的一个重要任务,对于关键词识别、自动语音识别等应用具有至关重要的意义。传统方法往往独立预测每一帧音频是否为音素边界,忽视了音素边界与整个音频序列以及相邻帧之间的内在联系,从而影响了分割的准确性和连贯性。本文提出一种基于预训练模型与序列建模的音素分割方法,在HuBERT模型提取声学特征的基础上,结合BiLSTM捕捉长期依赖,再用CRF优化序列,提升了音素边界检测的性能。在TIMIT和Buckeye数据集上的实验表明,本文方法优于现有技术,证明了序列建模在音素分割任务中的有效性。”
pdf
bib
abs
DialectMoE: An End-to-End Multi-Dialect Speech Recognition Model with Mixture-of-Experts
Zhou Jie
|
Gao Shengxiang
|
Yu Zhengtao
|
Dong Ling
|
Wang Wenjun
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“Dialect speech recognition has always been one of the challenges in Automatic Speech Recog-nition (ASR) systems. While lots of ASR systems perform well in Mandarin, their performancesignificantly drops when handling dialect speech. This is mainly due to the obvious differencesbetween dialects and Mandarin in pronunciation and the data scarcity of dialect speech. In thispaper, we propose DialectMoE, a Chinese multi-dialects speech recognition model based onMixture-of-Experts (MoE) in a low-resource conditions. Specifically, DialectMoE assigns inputsequences to a set of experts using a dynamic routing algorithm, with each expert potentiallytrained for a specific dialect. Subsequently, the outputs of these experts are combined to derivethe final output. Due to the similarities among dialects, distinct experts may offer assistance inrecognizing other dialects as well. Experimental results on the Datatang dialect public datasetshow that, compared with the baseline model, DialectMoE reduces Character Error Rate (CER)for Sichuan, Yunnan, Hubei and Henan dialects by 23.6%, 32.6%, 39.2% and 35.09% respec-tively. The proposed DialectMoE model demonstrates outstanding performance in multi-dialectsspeech recognition.”