Effective Speaker Diarization Leveraging Multi-task Logarithmic Loss Objectives

Jhih-Rong Guo; Tien-Hong Lo; Yu-Sheng Tsao; Pei-Ying Lee; Yung-Chang Hsu; Berlin Chen

Effective Speaker Diarization Leveraging Multi-task Logarithmic Loss Objectives

Jhih-Rong Guo, Tien-Hong Lo, Yu-Sheng Tsao, Pei-Ying Lee, Yung-Chang Hsu, Berlin Chen

Abstract

End-to-End Neural Diarization (EEND) has undergone substantial development, particularly with powerset classification methods that enhance performance but can exacerbate speaker confusion. To address this, we propose a novel training strategy that complements the standard cross entropy loss with an auxiliary ordinal log loss, guided by a distance matrix of speaker combinations. Our experiments reveal that while this approach yields significant relative improvements of 15.8% in false alarm rate and 10.0% in confusion error rate, it also uncovers a critical trade-off with an increased missed error rate. The primary contribution of this work is the identification and analysis of this trade-off, which stems from the model adopting a more conservative prediction strategy. This insight is crucial for designing more balanced and effective loss functions in speaker diarization.

Anthology ID:: 2025.rocling-main.17
Volume:: Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Month:: November
Year:: 2025
Address:: National Taiwan University, Taipei City, Taiwan
Editors:: Kai-Wei Chang, Ke-Han Lu, Chih-Kai Yang, Zhi-Rui Tam, Wen-Yu Chang, Chung-Che Wang
Venue:: ROCLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 140–145
Language:
URL:: https://aclanthology.org/2025.rocling-main.17/
DOI:
Bibkey:
Cite (ACL):: Jhih-Rong Guo, Tien-Hong Lo, Yu-Sheng Tsao, Pei-Ying Lee, Yung-Chang Hsu, and Berlin Chen. 2025. Effective Speaker Diarization Leveraging Multi-task Logarithmic Loss Objectives. In Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025), pages 140–145, National Taiwan University, Taipei City, Taiwan. Association for Computational Linguistics.
Cite (Informal):: Effective Speaker Diarization Leveraging Multi-task Logarithmic Loss Objectives (Guo et al., ROCLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.rocling-main.17.pdf

PDF Cite Search Fix data