Computational Approaches to Quantitative Analysis of Pause Duration in Taiwan Mandarin

I-Ping Wan, Yu-Ju Lai, Pu Yu


Abstract
This study presents a quantitative analysis of pause-duration patterns in a Mandarin spoken corpus to establish a baseline for prosodic and cognitive assessment. Drawing on cross-linguistic research, the distribution of pause patterns is viewed as reflecting multiple underlying factors. Longer pauses aligned with prosodic and syntactic boundaries indicate more deliberative and planned discourse rather than spontaneous speech. Such settings place higher demands on cognitive and articulatory planning, producing extended thinking time as speakers handle complex topics and specialized terminology. The spoken corpus was automatically processed and annotated using an in-house alignment and pause-tagging pipeline. Outlier detection with a 3.0×IQR threshold retained 35,474 tokens and removed extreme values exceeding 1,016 ms. Short and medium pauses remained stable across mean, median, and variability measures, while long pauses showed a moderate reduction (16,436 to 15,420 tokens), with mean duration decreasing from 535 to 426 ms and standard deviation sharply reduced from 786 to 169 ms, while the median stayed around 370–380 ms. These findings demonstrate that automatic cleaning primarily removed aberrant values while preserving linguistically meaningful long pauses. This baseline from non-impaired adult speakers underscores the need for corpus-specific frameworks and offers a reference point for cross-linguistic research on speech planning.
Anthology ID:
2025.rocling-main.14
Volume:
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Month:
November
Year:
2025
Address:
National Taiwan University, Taipei City, Taiwan
Editors:
Kai-Wei Chang, Ke-Han Lu, Chih-Kai Yang, Zhi-Rui Tam, Wen-Yu Chang, Chung-Che Wang
Venue:
ROCLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
116–123
Language:
URL:
https://aclanthology.org/2025.rocling-main.14/
DOI:
Bibkey:
Cite (ACL):
I-Ping Wan, Yu-Ju Lai, and Pu Yu. 2025. Computational Approaches to Quantitative Analysis of Pause Duration in Taiwan Mandarin. In Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025), pages 116–123, National Taiwan University, Taipei City, Taiwan. Association for Computational Linguistics.
Cite (Informal):
Computational Approaches to Quantitative Analysis of Pause Duration in Taiwan Mandarin (Wan et al., ROCLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.rocling-main.14.pdf