Data mining Mandarin tone contour shapes

Shuo Zhang


Abstract
In spontaneous speech, Mandarin tones that belong to the same tone category may exhibit many different contour shapes. We explore the use of time-series data mining techniques for understanding the variability of tones in a large corpus of Mandarin newscast speech. First, we adapt a graph-based approach to characterize the clusters (fuzzy types) of tone contour shapes observed in each tone n-gram category. Second, we show correlations between these realized contour shape clusters and a bag of automatically extracted linguistic features. We discuss the implications of the current study within the context of phonological and information theory.
Anthology ID:
W19-4217
Volume:
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Garrett Nicolai, Ryan Cotterell
Venue:
ACL
SIG:
SIGMORPHON
Publisher:
Association for Computational Linguistics
Note:
Pages:
144–153
Language:
URL:
https://aclanthology.org/W19-4217/
DOI:
10.18653/v1/W19-4217
Bibkey:
Cite (ACL):
Shuo Zhang. 2019. Data mining Mandarin tone contour shapes. In Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 144–153, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Data mining Mandarin tone contour shapes (Zhang, ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4217.pdf