Shibingfeng Zhang


2024

pdf bib
Voice Activity Detection on Italian Language
Shibingfeng Zhang | Gloria Gagliardi | Fabio Tamburini
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

Voice Activity Detection (VAD) refers to the task of identifying human voice activity in noisy settings, playing a crucial role in fields like speech recognition and audio surveillance. However, most VAD research focuses on English, leaving other languages, such as Italian, under-explored. This study aims to evaluate and enhance VAD systems for Italian speech, with the goal of finding a solution for the speech segmentation component of the Digital Linguistic Biomarkers (DLBs) extraction pipeline for early mental disorder diagnosis. We experimented with various VAD systems and propose an ensemble VAD system that integrates the best-performing models. Our ensemble system shows significant improvements in speech event detection. This advancement lays a robust foundation for more accurate early detection of mental health issues using DLBs in Italian.

2023

pdf bib
GPL at SemEval-2023 Task 1: WordNet and CLIP to Disambiguate Images
Shibingfeng Zhang | Shantanu Nath | Davide Mazzaccara
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Given a word in context, the task of VisualWord Sense Disambiguation consists of select-ing the correct image among a set of candidates. To select the correct image, we propose a so-lution blending text augmentation and multi-modal models. Text augmentation leverages thefine-grained semantic annotation from Word-Net to get a better representation of the tex-tual component. We then compare this sense-augmented text to the set of image using pre-trained multimodal models CLIP and ViLT. Oursystem has been ranked 16th for the Englishlanguage, achieving 68.5 points for hit rate and79.2 for mean reciprocal rank.