2024
pdf
bib
abs
基于神经编解码语言模型的老挝语韵律建模方法(A Method for Lao Prosody Modeling Based on Neural Codec Language Model)
Yi Ningjing (易宁静)
|
Wang Linqin (王琳钦)
|
Gao Shengxiang (高盛祥)
|
Yu Zhengtao (余正涛)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“为了赋予合成语音类似人类语言的丰富韵律和节奏变化,现有方法普遍采用基于随机数的时长预测器。这些方法通过使用随机数初始化的潜在变量来模拟人类说话的多样节奏变化。然而,由于依赖于随机数噪声的局限性,这些方法合成的语音往往仍然缺乏真实语音的多样性和韵律变化的丰富性。与之前方法不同,本文提出了一种基于神经编解码语言模型(VALL-E)的韵律建模方法,本文利用先验速度和音调时序变化曲线建模韵律变化分布,有效融入神经编解码语言模型训练过程中,并且在推理阶段可通过控制先验时序曲线控制生成语音的韵律。实验证明,本文方法合成英语音频达到了4.05的MOS评分,合成老挝语音频达到了3.61的MOS评分。基于神经编解码语言模型的老挝语韵律建模方法,能很好的在速度和音调方面实现韵律的可控性。”
pdf
bib
abs
基于方面引导的图文渐进融合的多模态方面级情感分析方法(Aspect-Guided Progressive Fusion of Text and Image for Multimodal Aspect-Based Sentiment Analysis)
Yan Zida (闫自达)
|
Guo Junjun (郭军军)
|
Yu Zhengtao (余正涛)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“多模态方面级情感分析旨在通过结合图像信息和文本信息来识别特定方面的情感极性。然而,图像和文本作为两种不同的模态,其在数据表现形式和语义表达上存在显著差异,缩小模态鸿沟和跨模态特征融合是多模态方面级情感分析任务中出现的两个关键问题。对此,本文提出了一种基于方面引导的图文渐进融合的多模态方面级情感分析方法,该方法采用图像和文本中重叠的方面信息作为枢轴,利用方面引导的图文对比学习和基于对比的跨模态语义交互来缩小模态差异、促进语义交互,然后在多模态特征空间中整合视觉和文本信息,通过方面引导的基于对比的多模态语义融合来促进跨模态特征融合,从而提升多模态情感分析的性能。在三个多模态方面级情感分析基准数据集上的实验结果证明了本文提出方法的有效性,并且优于其他大多数最先进的多模态方面级情感分析方法。”
pdf
bib
abs
基于联邦知识蒸馏的跨语言社交媒体事件检测(Cross-Lingual Social Event Detection Based on Federated Knowledge Distillation)
Zhou Shuaishuai (周帅帅)
|
Zhu Enchang (朱恩昌)
|
Gao Shengxiang (高盛祥)
|
Yu Zhengtao (余正涛)
|
Xian Yantuan (线岩团)
|
Zhao Zixiao (赵子霄)
|
Chen Lin (陈霖)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“社交媒体事件检测是指在从各类社交媒体的内容中挖掘热点事件。在实际情况中,由于数据稀缺,社交媒体事件检测在低资源的情况下表现较差。现有的方法主要通过跨语言知识迁移等方式来缓解低资源问题,但忽略了数据隐私问题。因此,本文提出了基于联邦知识蒸馏的跨语言社交媒体事件检测框架(FedEvent),旨在将富资源客户端知识蒸馏到低资源客户端。该框架通过结合参数高效微调技术和三组对比损失,实现非英文语义空间到英文语义空间的有效映射,并采用联邦蒸馏策略,保障数据隐私的前提下实现知识的迁移。此外,我们还设计了一套四阶段生命周期机制以适应增量场景。最后,我们在真实数据集上进行实验以证明该框架的有效性。”
pdf
bib
abs
基于预训练模型与序列建模的音素分割方法(Sequence Modeling)
Yang Shanglong (杨尚龙)
|
Yu Zhengtao (余正涛)
|
Wang Wenjun (王文君)
|
Dong Ling (董凌)
|
Gao Shengxiang (高盛祥)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“音素分割作为语音处理领域内的一个重要任务,对于关键词识别、自动语音识别等应用具有至关重要的意义。传统方法往往独立预测每一帧音频是否为音素边界,忽视了音素边界与整个音频序列以及相邻帧之间的内在联系,从而影响了分割的准确性和连贯性。本文提出一种基于预训练模型与序列建模的音素分割方法,在HuBERT模型提取声学特征的基础上,结合BiLSTM捕捉长期依赖,再用CRF优化序列,提升了音素边界检测的性能。在TIMIT和Buckeye数据集上的实验表明,本文方法优于现有技术,证明了序列建模在音素分割任务中的有效性。”
pdf
bib
abs
DialectMoE: An End-to-End Multi-Dialect Speech Recognition Model with Mixture-of-Experts
Zhou Jie
|
Gao Shengxiang
|
Yu Zhengtao
|
Dong Ling
|
Wang Wenjun
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“Dialect speech recognition has always been one of the challenges in Automatic Speech Recog-nition (ASR) systems. While lots of ASR systems perform well in Mandarin, their performancesignificantly drops when handling dialect speech. This is mainly due to the obvious differencesbetween dialects and Mandarin in pronunciation and the data scarcity of dialect speech. In thispaper, we propose DialectMoE, a Chinese multi-dialects speech recognition model based onMixture-of-Experts (MoE) in a low-resource conditions. Specifically, DialectMoE assigns inputsequences to a set of experts using a dynamic routing algorithm, with each expert potentiallytrained for a specific dialect. Subsequently, the outputs of these experts are combined to derivethe final output. Due to the similarities among dialects, distinct experts may offer assistance inrecognizing other dialects as well. Experimental results on the Datatang dialect public datasetshow that, compared with the baseline model, DialectMoE reduces Character Error Rate (CER)for Sichuan, Yunnan, Hubei and Henan dialects by 23.6%, 32.6%, 39.2% and 35.09% respec-tively. The proposed DialectMoE model demonstrates outstanding performance in multi-dialectsspeech recognition.”
pdf
bib
abs
Chinese Grammatical Error Correction via Large Language Model Guided Optimization Training
Liu Xiao
|
Li Ying
|
Yu Zhengtao
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“Pre-trained language model-based methods for Chinese Grammatical Error Correction (CGEC)are categorized into Seq2Seq and Seq2Edit types. However, both Seq2Seq and Seq2Edit mod-els depend on high-quality training data significantly. Considering the strong generation andinference ability of large language models (LLMs), we propose a large language model-guidedoptimization training method to exploit LLMs to extract error knowledge to optimize the tradi-tional CGEC model training process. On the one hand, we use error types and confusion sets asextra knowledge to guide LLMs to generate diverse pseudo data, thus extending the error distri-bution of our training data. On the other hand, LLMs are utilized to infer the predicted resultsfrom our CGEC models and obtain the re-training data, thus iteratively optimizing our pre-trainedCGEC models. Experiments on two benchmark datasets show that our LLMs-guided optimiza-tion method with small-scale training data can achieve comparable results with baseline modelswith large-scale training data. Detailed comparison experiments demonstrate that both the earlydeviser pseudo data and the later re-training data are extremely useful for traditional CGEC modeloptimization training, and can benefit from each other. We will release our code and prompts athttps://github.com/SakuraAcedia/llm-cgec-got to facilitate future work.”
2021
pdf
bib
abs
Emotion Classification of COVID-19 Chinese Microblogs based on the Emotion Category Description
Guo Xianwei
|
Lai Hua
|
Xiang Yan
|
Yu Zhengtao
|
Huang Yuxin
Proceedings of the 20th Chinese National Conference on Computational Linguistics
Emotion classification of COVID-19 Chinese microblogs helps analyze the public opinion triggered by COVID-19. Existing methods only consider the features of the microblog itself with-out combining the semantics of emotion categories for modeling. Emotion classification of mi-croblogs is a process of reading the content of microblogs and combining the semantics of emo-tion categories to understand whether it contains a certain emotion. Inspired by this we proposean emotion classification model based on the emotion category description for COVID-19 Chi-nese microblogs. Firstly we expand all emotion categories into formalized category descriptions. Secondly based on the idea of question answering we construct a question for each microblogin the form of ‘What is the emotion expressed in the text X?’ and regard all category descrip-tions as candidate answers. Finally we construct a question-and-answer pair and use it as the input of the BERT model to complete emotion classification. By integrating rich contextual andcategory semantics the model can better understand the emotion of microblogs. Experimentson the COVID-19 Chinese microblog dataset show that our approach outperforms many existinge motion classification methods including the BERT baseline.