2025
pdf
bib
abs
Multimodal Approaches for Stress Recognition: A Comparative Study Using the StressID Dataset
Chia-Yun Lee
|
Matúš Pleva
|
Daniel Hladek
|
Ming-Hsiang Su
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Mental health concerns have garnered increasing attention, highlighting the importance of timely and accurate identification of individual stress states as a critical research domain. This study employs the multimodal StressID dataset to evaluate the contributions of three modalities—physiological signals, video, and audio—in stress recognition tasks. A set of machine learning models, including Random Forests (RF), Support Vector Machines (SVM), Multi-Layer Perceptrons (MLP), and K-Nearest Neighbors (KNN), were trained and tested with optimized parameters for each modality. In addition, the effectiveness of different multimodal fusion strategies was systematically examined. The unimodal experiments revealed that the physiological modality achieved the highest performance in the binary stress classification task (F1-score = 0.751), whereas the audio modality outperformed the others in the three-class classification task (F1-score = 0.625). In the multimodal setting, feature-level fusion yielded stable improvements in the binary classification task, while decision-level fusion achieved superior performance in the three-class classification task (F1-score = 0.65). These findings demonstrate that multimodal integration can substantially enhance the accuracy of stress recognition. Future research directions include incorporating temporal modeling and addressing data imbalance to further improve the robustness and applicability of stress recognition systems.
pdf
bib
abs
Challenges and Limitations of the Multilingual Pre-trained Model Whisper on Low-Resource Languages: A Case Study of Hakka Speech Recognition
Pei-Chi Lan
|
Hsin-Tien Chiang
|
Ting-Chun Lin
|
Ming-Hsiang Su
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
This study investigates the practical performance and limitations of the multilingual pre-trained model Whisper in low-resource language settings, using a Hakka speech recognition challenge as a case study. In the preliminary phase, our team (Group G) achieved official scores of 75.58% in Character Error Rate (CER) and 100.97% in Syllable Error Rate (SER). However, in the final phase, both CER and Word Error Rate (WER) reached 100%. Through a retrospective analysis of system design and implementation, we identified three major sources of failure: (1) improper handling of long utterances, where only the first segment was decoded, causing content truncation; (2) inconsistent language prompting, fixed to “Chinese” instead of the Hakka target; and (3) lack of systematic verification in data alignment and submission generation, combined with inadequate evaluation setup.Based on these findings, we propose a set of practical guidelines covering long-utterance processing, language consistency checking, and data submission validation. The results highlight that in low-resource speech recognition tasks, poor data quality or flawed workflow design can cause severe degradation of model performance. This study underscores the importance of robust data and process management in ASR system development and provides concrete insights for future improvements and reproducibility.
2023
pdf
bib
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)
Jheng-Long Wu
|
Ming-Hsiang Su
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)
pdf
bib
Fine-Tuning and Evaluation of Question Generation for Slovak Language
Ondrej Megela
|
Daniel Hladek
|
Matus Pleva
|
Ján Staš
|
Ming-Hsiang Su
|
Yuan-Fu Liao
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)
pdf
bib
Application of Deep Learning Technology to Predict Changes in Sea Level
Yi-Lin Hsieh
|
Ming-Hsiang Su
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)
pdf
bib
A Novel Named Entity Recognition Model Applied to Specialized Sequence Labeling
Ruei-Cyuan Su
|
Tzu-En Su
|
Ming-Hsiang Su
|
Matus Pleva
|
Daniel Hladek
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)
pdf
bib
SCU-MESCLab at ROCLING-2023 Shared Task:Named Entity Recognition Using Multiple Classifier Model
Tzu-En Su
|
Ruei-Cyuan Su
|
Ming-Hsiang Su
|
Tsung-Hsien Yang
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)
2022
pdf
bib
abs
RoBERTa-based Traditional Chinese Medicine Named Entity Recognition Model
Ming-Hsiang Su
|
Chin-Wei Lee
|
Chi-Lun Hsu
|
Ruei-Cyuan Su
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
In this study, a named entity recognition was constructed and applied to the identification of Chinese medicine names and disease names. The results can be further used in a human-machine dialogue system to provide people with correct Chinese medicine medication reminders. First, this study uses web crawlers to sort out web resources into a Chinese medicine named entity corpus, collecting 1097 articles, 1412 disease names and 38714 Chinese medicine names. Then, we annotated each article using TCM name and BIO tagging method. Finally, this study trains and evaluates BERT, ALBERT, RoBERTa, GPT2 with BiLSTM and CRF. The experimental results show that RoBERTa’s NER system combining BiLSTM and CRF achieves the best system performance, with a precision rate of 0.96, a recall rate of 0.96, and an F1-score of 0.96.
pdf
bib
abs
SCU-MESCLab at ROCLING-2022 Shared Task: Named Entity Recognition Using BERT Classifier
Tsung-Hsien Yang
|
Ruei-Cyuan Su
|
Tzu-En Su
|
Sing-Seong Chong
|
Ming-Hsiang Su
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
In this study, named entity recognition is constructed and applied in the medical domain. Data is labeled in BIO format. For example, “muscle” would be labeled “B-BODY” and “I-BODY”, and “cough” would be “B-SYMP” and “I-SYMP”. All words outside the category are marked with “O”. The Chinese HealthNER Corpus contains 30,692 sentences, of which 2531 sentences are divided into the validation set (dev) for this evaluation, and the conference finally provides another 3204 sentences for the test set (test). We use BLSTM_CRF, Roberta+BLSTM_CRF and BERT Classifier to submit three prediction results respectively. Finally, the BERT Classifier system submitted as RUN3 achieved the best prediction performance, with an accuracy of 80.18%, a recall rate of 78.3%, and an F1-score of 79.23.
2021
pdf
bib
abs
Speech Emotion Recognition Based on CNN+LSTM Model
Wei Mou
|
Pei-Hsuan Shen
|
Chu-Yun Chu
|
Yu-Cheng Chiu
|
Tsung-Hsien Yang
|
Ming-Hsiang Su
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)
Due to the popularity of intelligent dialogue assistant services, speech emotion recognition has become more and more important. In the communication between humans and machines, emotion recognition and emotion analysis can enhance the interaction between machines and humans. This study uses the CNN+LSTM model to implement speech emotion recognition (SER) processing and prediction. From the experimental results, it is known that using the CNN+LSTM model achieves better performance than using the traditional NN model.
pdf
bib
abs
Discussion on the relationship between elders’ daily conversations and cognitive executive function: using word vectors and regression models
Ming-Hsiang Su
|
Yu-An Ko
|
Man-Ying Wang
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)
As the average life expectancy of Chinese people rises, the health care problems of the elderly are becoming more diverse, and the demand for long-term care is also increasing. Therefore, how to help the elderly have a good quality of life and maintain their dignity is what we need to think about. This research intends to explore the characteristics of natural language of normal aging people through a deep model. First, we collect information through focus groups so that the elders can naturally interact with other participants in the process. Then, through the word vector model and regression model, an executive function prediction model based on dialogue data is established to help understand the degradation trajectory of executive function and establish an early warning.
pdf
bib
abs
SoochowDS at ROCLING-2021 Shared Task: Text Sentiment Analysis Using BERT and LSTM
Ruei-Cyuan Su
|
Sig-Seong Chong
|
Tzu-En Su
|
Ming-Hsiang Su
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)
In this shared task, this paper proposes a method to combine the BERT-based word vector model and the LSTM prediction model to predict the Valence and Arousal values in the text. Among them, the BERT-based word vector is 768-dimensional, and each word vector in the sentence is sequentially fed to the LSTM model for prediction. The experimental results show that the performance of our proposed method is better than the results of the Lasso Regression model.