Mushaffa Rasyid Ridha

2024

Refining rtMRI Landmark-Based Vocal Tract Contour Labels with FCN-Based Smoothing and Point-to-Curve Projection
Mushaffa Rasyid Ridha | Sakriani Sakti
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Advanced real-time Magnetic Resonance Imaging (rtMRI) enables researchers to study dynamic articulatory movements during speech production with high temporal resolution. However, accurately outlining articulator contours in high-frame-rate rtMRI presents challenges due to data scalability and image quality issues, making manual and automatic labeling difficult. The widely used publicly available USC-TIMIT dataset offers rtMRI data with landmark-based contour labels derived from unsupervised region segmentation using spatial frequency domain representation and gradient descent optimization. Unfortunately, occasional labeling errors exist, and many contour detection methods were trained and tested based on this ground truth, which is not purely a gold label, with the resulting contour data largely remaining undisclosed to the public. This paper offers a refinement of landmark-based vocal-tract contour labels by employing outlier removal, full convolutional network (FCN)-based smoothing, and a landmark point-to-edge curve projection technique. Since there is no established ground truth label, we evaluate the quality of the new labels through subjective assessments of several contour areas, comparing them to the existing data labels.

Co-authors

Sakriani Sakti 1

Venues

COLING1
LREC1

Fix author