Shih - Hung Wu - ACL Anthology

Shih - Hung Wu

Also published as: Shih-Hung Wu, Shih-hung Wu

2025

CYUT-NLP at ROCLING-2025 Shared Task: Valence–Arousal Prediction in Physicians’ Texts Using BERT, RAG, and Multi-Teacher Pseudo-Labeling
Yi-Min Jian | An Yu Hsiao | Shih-Hung Wu
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)

Accurately modeling physicians’ emotional states from self-reflection texts remains challenging due to the lowresource, domain-specific nature of medical corpora. The proposed workflow performs Retrieval-Augmented Generation (RAG) and multi-teacher pseudo-labeling to generate high-quality augmented data. This workflow enables effective crossdomain adaptation from general text corpora to professional medical texts. Evaluations on the ROCLING 2025 test set demonstrate substantial improvements over the best-performing baseline in Valence–Arousal prediction accuracy and model stability. Importantly, the workflow is domain-agnostic and provides a generalizable methodology for systematically transferring models to new, low-resource domains, making it applicable beyond medical text analysis.

pdf bib abs

CYUT at SemEval-2025 Task 6: Prompting with Precision – ESG Analysis via Structured Prompts
Shih - Hung Wu | Z h i - H o n g Lin | Ping - Hsuan Lee
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

In response to the increasing need for efficientESG verification, we propose an innovativeNLP framework that automates the evaluationof corporate sustainability claims. Ourmethod integrates Retrieval-Augmented Generation,Chain-of-Thought reasoning, and structuredprompt engineering to effectively processand classify diverse, multilingual ESG disclosures.Evaluated under the SemEval-2025PromiseEval competition, our system achievedtop-tier performance—securing first place onthe public English leaderboard, excelling in theFrench track, and delivering marked improvementsover conventional machine learning approaches.These results highlight the framework’spotential to offer a scalable, transparent,and robust solution for corporate ESG assessment.

2024

pdf bib abs

CYUT at SemEval-2024 Task 7: A Numerals Augmentation and Feature Enhancement Approach to Numeral Reading Comprehension
Tsz-yeung Lau | Shih-hung Wu
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This study explores Task 2 in NumEval-2024, which is SemEval-2024(Semantic Evaluation)Task 7 , focusing on the Reading Comprehension of Numerals in Text (Chinese). The datasetutilized in this study is the Numeral-related Question Answering Dataset (NQuAD), and the model employed is BERT. The data undergoes preprocessing, incorporating Numerals Augmentation and Feature Enhancement to numerical entities before model training. Additionally, fine-tuning will also be applied. The result was an accuracy rate of 77.09%, representing a 7.14% improvement compared to the initial NQuAD processing model, referred to as the Numeracy-Enhanced Model (NEMo).

2021

pdf bib abs

CYUT at ROCLING-2021 Shared Task: Based on BERT and MacBERT
Xie-Sheng Hong | Shih-Hung Wu
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

This paper present a description for the ROCLING 2021 shared task in dimensional sentiment analysis for educational texts. We submitted two runs in the final test. Both runs use the standard regression model. The Run1 uses Chinese version of BERT as the base, and in Run2 we use the early version of MacBERT that Chinese version of RoBERTa-like BERT model, RoBERTa-wwm-ext. Using powerful pre-training model of BERT for text embedding to help train the model.

2020

pdf bib abs

Learning the Human Judgment for the Automatic Evaluation of Chatbot
Shih-Hung Wu | Sheng-Lun Chien
Proceedings of the Twelfth Language Resources and Evaluation Conference

It is hard to evaluate the quality of the generated text by a generative dialogue system. Currently, dialogue evaluation relies on human judges to label the quality of the generated text. It is not a reusable mechanism that can give consistent evaluation for system developers. We believe that it is easier to get consistent results on comparing two generated dialogue by two systems and it is hard to give a consistent quality score on only one system at a time. In this paper, we propose a machine learning approach to reduce the effort of human evaluation by learning the human judgment on comparing two dialogue systems. Training from the human labeling result, the evaluation model learns which generative models is better in each dialog context. Thus, it can be used for system developers to compare the fine-tuned models over and over again without the human labor. In our experiment we find the agreement between the learned model and human judge is 70%. The experiment is conducted on comparing two attention based GRU-RNN generative models.

pdf bib abs

CYUT Team Chinese Grammatical Error Diagnosis System Report in NLPTEA-2020 CGED Shared Task
Shih-Hung Wu | Junwei Wang
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications

This paper reports our Chinese Grammatical Error Diagnosis system in the NLPTEA-2020 CGED shared task. In 2020, we sent two runs with two approaches. The first one is a combination of conditional random fields (CRF) and a BERT model deep-learning approach. The second one is a BERT model deep-learning approach. The official results shows that our run1 achieved the highest precision rate 0.9875 with the lowest false positive rate 0.0163 on detection, while run2 gives a more balanced performance.

Shih - Hung Wu

2025

2024

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2006

2005

2004

2003

2002

1998

1997

Co-authors

Venues