Pei-Ying Lee


2025

pdf bib
Effective Speaker Diarization Leveraging Multi-task Logarithmic Loss Objectives
Jhih-Rong Guo | Tien-Hong Lo | Yu-Sheng Tsao | Pei-Ying Lee | Yung-Chang Hsu | Berlin Chen
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)

End-to-End Neural Diarization (EEND) has undergone substantial development, particularly with powerset classification methods that enhance performance but can exacerbate speaker confusion. To address this, we propose a novel training strategy that complements the standard cross entropy loss with an auxiliary ordinal log loss, guided by a distance matrix of speaker combinations. Our experiments reveal that while this approach yields significant relative improvements of 15.8% in false alarm rate and 10.0% in confusion error rate, it also uncovers a critical trade-off with an increased missed error rate. The primary contribution of this work is the identification and analysis of this trade-off, which stems from the model adopting a more conservative prediction strategy. This insight is crucial for designing more balanced and effective loss functions in speaker diarization.