2024
pdf
bib
abs
An Effective Pronunciation Assessment Approach Leveraging Hierarchical Transformers and Pre-training Strategies
Bi-Cheng Yan
|
Jiun-Ting Li
|
Yi-Cheng Wang
|
Hsin Wei Wang
|
Tien-Hong Lo
|
Yung-Chang Hsu
|
Wei-Cheng Chao
|
Berlin Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Automatic pronunciation assessment (APA) manages to quantify a second language (L2) learner’s pronunciation proficiency in a target language by providing fine-grained feedback with multiple pronunciation aspect scores at various linguistic levels. Most existing efforts on APA typically parallelize the modeling process, namely predicting multiple aspect scores across various linguistic levels simultaneously. This inevitably makes both the hierarchy of linguistic units and the relatedness among the pronunciation aspects sidelined. Recognizing such a limitation, we in this paper first introduce HierTFR, a hierarchal APA method that jointly models the intrinsic structures of an utterance while considering the relatedness among the pronunciation aspects. We also propose a correlation-aware regularizer to strengthen the connection between the estimated scores and the human annotations. Furthermore, novel pre-training strategies tailored for different linguistic levels are put forward so as to facilitate better model initialization. An extensive set of empirical experiments conducted on the speechocean762 benchmark dataset suggest the feasibility and effectiveness of our approach in relation to several competitive baselines.
2023
pdf
bib
Auxiliary loss to attention head for end to end speaker diarization
Yi-Ting Yang
|
Jiun-Ting Li
|
Berlin Chen
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)
2021
pdf
bib
abs
A Preliminary Study on Environmental Sound Classification Leveraging Large-Scale Pretrained Model and Semi-Supervised Learning
You-Sheng Tsao
|
Tien-Hong Lo
|
Jiun-Ting Li
|
Shi-Yan Weng
|
Berlin Chen
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)
With the widespread commercialization of smart devices, research on environmental sound classification has gained more and more attention in recent years. In this paper, we set out to make effective use of large-scale audio pretrained model and semi-supervised model training paradigm for environmental sound classification. To this end, an environmental sound classification method is first put forward, whose component model is built on top a large-scale audio pretrained model. Further, to simulate a low-resource sound classification setting where only limited supervised examples are made available, we instantiate the notion of transfer learning with a recently proposed training algorithm (namely, FixMatch) and a data augmentation method (namely, SpecAugment) to achieve the goal of semi-supervised model training. Experiments conducted on bench-mark dataset UrbanSound8K reveal that our classification method can lead to an accuracy improvement of 2.4% in relation to a current baseline method.