Run Chen
2024
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
Haozhe Chen
|
Run Chen
|
Julia Hirschberg
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
While recent advances in Text-to-Speech (TTS) technology produce natural and expressive speech, they lack the option for users to select emotion and control intensity. We propose EmoKnob, a framework that allows fine-grained emotion control in speech synthesis with few-shot demonstrative samples of arbitrary emotion. Our framework leverages the expressive speaker representation space made possible by recent advances in foundation voice cloning models. Based on the few-shot capability of our emotion control framework, we propose two methods to apply emotion control on emotions described by open-ended text, enabling an intuitive interface for controlling a diverse array of nuanced emotions. To facilitate a more systematic emotional speech synthesis field, we introduce a set of evaluation metrics designed to rigorously assess the faithfulness and recognizability of emotion control frameworks. Through objective and subjective evaluations, we show that our emotion control framework effectively embeds emotions into speech and surpasses emotion expressiveness of commercial TTS services.
2020
融入多尺度特征注意力的胶囊神经网络及其在文本分类中的应用(Incorporating Multi-scale Feature Attention into Capsule Network and its Application in Text Classification)
Chaofan Wang (王超凡)
|
Shenggen Ju (琚生根)
|
Jieping Sun (孙界平)
|
Run Chen (陈润)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
近些年来,胶囊神经网络(Capsnets)由于拥有强大的文本特征学习能力已被应用到了文本分类任务当中。目前的研究工作大部分都将提取到的文本多元语法特征视为同等重要,而忽略了单词所对应各个多元语法特征的重要程度应该是由具体上下文决定的这一问题,这将直接影响到模型对整个文本的语义理解。针对上述问题,本文提出了多尺度特征部分连接胶囊网络(MulPart-Capsnets)。该方法将多尺度特征注意力融入到Capsnets中,多尺度特征注意力能够自动选择不同尺度的多元语法特征,通过对其进行加权求和,就能为每个单词精确捕捉到丰富的多元语法特征。同时,为了减少子胶囊与父胶囊之间的冗余信息传递,本文同时也对路由算法进行了改进。 本文提出的算法在文本分类任务上针对七个著名的数据集进行了有效性验证,和现有的研究工作相比,性能显著提高,说明了本文的算法能够捕获文本中更丰富的多元语法特征,具有更加强大的文本特征学习能力。
Search