Exploring Sentence Stress Detection using Whisper-based Speech Models

Ting-An Hung, Yu-Hsuan Hsieh, Tien-Hong Lo, Yung-Chang Hsu, Berlin Chen


Abstract
Sentence stress reflects the relative prominence of words within a sentence. It is fundamental to speech intelligibility and naturalness, and is particularly important in second language (L2) learning. Accurate stress production facilitates effective communication and reduces misinterpretation. In this work, we investigate sentence stress detection (SSD) using Whisper-based transformer speech models under diverse settings, including model scaling, backbone–decoder interactions, architectural and regularization enhancements, and embedding visualization for interpretability. Results show that smaller Whisper variants achieve stronger performance under limited data, while architectural and regularization enhancements improves stability and generalization. Embedding analysis reveal clear separation between stressed and unstressed words. These findings offer practical insights into model selection, architecture design, and interpretability for SSD applications, with implications for L2 learning support tools.
Anthology ID:
2025.rocling-main.33
Volume:
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Month:
November
Year:
2025
Address:
National Taiwan University, Taipei City, Taiwan
Editors:
Kai-Wei Chang, Ke-Han Lu, Chih-Kai Yang, Zhi-Rui Tam, Wen-Yu Chang, Chung-Che Wang
Venue:
ROCLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
314–319
Language:
URL:
https://aclanthology.org/2025.rocling-main.33/
DOI:
Bibkey:
Cite (ACL):
Ting-An Hung, Yu-Hsuan Hsieh, Tien-Hong Lo, Yung-Chang Hsu, and Berlin Chen. 2025. Exploring Sentence Stress Detection using Whisper-based Speech Models. In Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025), pages 314–319, National Taiwan University, Taipei City, Taiwan. Association for Computational Linguistics.
Cite (Informal):
Exploring Sentence Stress Detection using Whisper-based Speech Models (Hung et al., ROCLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.rocling-main.33.pdf