Jingwei Zhang

Also published as: 璟玮


2024

“筛选有效表达情感的声学特征对语音情感研究至关重要。对具有相同或相似声学特征的情感,声学研究中仅使用基频和时长无法有效区分。本研究扩大声学参数的种类和数量,使用三种机器学习方法,筛选出区分情感类型的多组有效声学参数,补充和完善语音情感声学研究的声学特征集。研究发现,区分不同情感所依赖的声学参数、参数数量、参数贡献都不相同,其中频谱和信噪参数发挥重要作用。本研究为语音情感声学分析的参数选择提供参考。”

2023

Recently, pre-trained vision-language (VL) models have achieved remarkable success in various cross-modal tasks, including referring expression comprehension (REC). These models are pre-trained on the large-scale image-text pairs to learn the alignment between words in textual descriptions and objects in the corresponding images and then fine-tuned on downstream tasks. However, the performance of VL models is hindered when dealing with implicit text, which describes objects through comparisons between two or more objects rather than explicitly mentioning them. This is because the models struggle to align the implicit text with the objects in the images. To address the challenge, we introduce CLEVR-Implicit, a dataset consisting of synthetic images and corresponding two types of implicit text for the REC task. Additionally, to enhance the performance of VL models on implicit text, we propose a method called Transforming Implicit text into Explicit text (TIE), which enables VL models to reason with the implicit text. TIE consists of two modules: (1) the prompt design module builds prompts for implicit text by adding masked tokens, and (2) the cloze procedure module fine-tunes the prompts by utilizing masked language modeling (MLM) to predict the explicit words with the implicit prompts. Experimental results on our dataset demonstrate a significant improvement of 37.94% in the performance of VL models on implicit text after employing our TIE method.

2018

2015

2014