Voice Activity Detection on Italian Language

Shibingfeng Zhang, Gloria Gagliardi, Fabio Tamburini


Abstract
Voice Activity Detection (VAD) refers to the task of identifying human voice activity in noisy settings, playing a crucial role in fields like speech recognition and audio surveillance. However, most VAD research focuses on English, leaving other languages, such as Italian, under-explored. This study aims to evaluate and enhance VAD systems for Italian speech, with the goal of finding a solution for the speech segmentation component of the Digital Linguistic Biomarkers (DLBs) extraction pipeline for early mental disorder diagnosis. We experimented with various VAD systems and propose an ensemble VAD system that integrates the best-performing models. Our ensemble system shows significant improvements in speech event detection. This advancement lays a robust foundation for more accurate early detection of mental health issues using DLBs in Italian.
Anthology ID:
2024.clicit-1.111
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
1024–1029
Language:
URL:
https://aclanthology.org/2024.clicit-1.111/
DOI:
Bibkey:
Cite (ACL):
Shibingfeng Zhang, Gloria Gagliardi, and Fabio Tamburini. 2024. Voice Activity Detection on Italian Language. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 1024–1029, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
Voice Activity Detection on Italian Language (Zhang et al., CLiC-it 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clicit-1.111.pdf