Shih-Hung Wu


2021

pdf bib
CYUT at ROCLING-2021 Shared Task: Based on BERT and MacBERT
Xie-Sheng Hong | Shih-Hung Wu
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

This paper present a description for the ROCLING 2021 shared task in dimensional sentiment analysis for educational texts. We submitted two runs in the final test. Both runs use the standard regression model. The Run1 uses Chinese version of BERT as the base, and in Run2 we use the early version of MacBERT that Chinese version of RoBERTa-like BERT model, RoBERTa-wwm-ext. Using powerful pre-training model of BERT for text embedding to help train the model.

2020

pdf bib
CYUT Team Chinese Grammatical Error Diagnosis System Report in NLPTEA-2020 CGED Shared Task
Shih-Hung Wu | Junwei Wang
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications

This paper reports our Chinese Grammatical Error Diagnosis system in the NLPTEA-2020 CGED shared task. In 2020, we sent two runs with two approaches. The first one is a combination of conditional random fields (CRF) and a BERT model deep-learning approach. The second one is a BERT model deep-learning approach. The official results shows that our run1 achieved the highest precision rate 0.9875 with the lowest false positive rate 0.0163 on detection, while run2 gives a more balanced performance.

pdf bib
Learning the Human Judgment for the Automatic Evaluation of Chatbot
Shih-Hung Wu | Sheng-Lun Chien
Proceedings of the 12th Language Resources and Evaluation Conference

It is hard to evaluate the quality of the generated text by a generative dialogue system. Currently, dialogue evaluation relies on human judges to label the quality of the generated text. It is not a reusable mechanism that can give consistent evaluation for system developers. We believe that it is easier to get consistent results on comparing two generated dialogue by two systems and it is hard to give a consistent quality score on only one system at a time. In this paper, we propose a machine learning approach to reduce the effort of human evaluation by learning the human judgment on comparing two dialogue systems. Training from the human labeling result, the evaluation model learns which generative models is better in each dialog context. Thus, it can be used for system developers to compare the fine-tuned models over and over again without the human labor. In our experiment we find the agreement between the learned model and human judge is 70%. The experiment is conducted on comparing two attention based GRU-RNN generative models.

2019

pdf bib
基於Seq2Seq模型的中文文法錯誤診斷系統(A Chinese Grammatical Error Diagnosis System Based on Seq2Seq Model)
Jun-Wei Wang | Sheng-Lun Chien | Yi-Kun Chen | Shih-Hung Wu
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)

2018

pdf bib
A Short Answer Grading System in Chinese by Support Vector Approach
Shih-Hung Wu | Wen-Feng Shih
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

In this paper, we report a short answer grading system in Chinese. We build a system based on standard machine learning approaches and test it with translated corpus from two publicly available corpus in English. The experiment results show similar results on two different corpus as in English.

pdf bib
CYUT-III Team Chinese Grammatical Error Diagnosis System Report in NLPTEA-2018 CGED Shared Task
Shih-Hung Wu | Jun-Wei Wang | Liang-Pu Chen | Ping-Che Yang
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

This paper reports how we build a Chinese Grammatical Error Diagnosis system in the NLPTEA-2018 CGED shared task. In 2018, we sent three runs with three different approaches. The first one is a pattern-based approach by frequent error pattern matching. The second one is a sequential labelling approach by conditional random fields (CRF). The third one is a rewriting approach by sequence to sequence (seq2seq) model. The three approaches have different properties that aim to optimize different performance metrics and the formal run results show the differences as we expected.

2017

pdf bib
CYUT at IJCNLP-2017 Task 3: System Report for Review Opinion Diversification
Shih-Hung Wu | Su-Yu Chang | Liang-Pu Chen
Proceedings of the IJCNLP 2017, Shared Tasks

Review Opinion Diversification (RevOpiD) 2017 is a shared task which is held in International Joint Conference on Natural Language Processing (IJCNLP). The shared task aims at selecting top-k reviews, as a summary, from a set of re-views. There are three subtasks in RevOpiD: helpfulness ranking, rep-resentativeness ranking, and ex-haustive coverage ranking. This year, our team submitted runs by three models. We focus on ranking reviews based on the helpfulness of the reviews. In the first two models, we use linear regression with two different loss functions. First one is least squares, and second one is cross entropy. The third run is a random baseline. For both k=5 and k=10, our second model gets the best scores in the official evaluation metrics.

2016

pdf bib
以語言模型評估學習者文句修改前後之流暢度(Using language model to assess the fluency of learners sentences edited by teachers)[In Chinese]
Guan-Ying Pu | Po-Lin Chen | Shih-Hung Wu
Proceedings of the 28th Conference on Computational Linguistics and Speech Processing (ROCLING 2016)

pdf bib
CYUT-III System at Chinese Grammatical Error Diagnosis Task
Po-Lin Chen | Shih-Hung Wu | Liang-Pu Chen | Ping-Che Yang
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)

This paper describe the CYUT-III system on grammar error detection in the 2016 NLP-TEA Chinese Grammar Error Detection shared task CGED. In this task a system has to detect four types of errors, in-cluding redundant word error, missing word error, word selection error and word ordering error. Based on the conditional random fields (CRF) model, our system is a linear tagger that can detect the errors in learners’ essays. Since the system performance depends on the features heavily, in this paper, we are going to report how to integrate the collocation feature into the CRF model. Our system presents the best detection accuracy and Identification accuracy on the TOCFL dataset, which is in traditional Chi-nese. The same system also works well on the simplified Chinese HSK dataset.

2015

pdf bib
以語言模型判斷學習者文句流暢度(Analyzing Learners ‘Writing Fluency Based on Language Model)[In Chinese]
Po-Lin Chen | Shih-Hung Wu
Proceedings of the 27th Conference on Computational Linguistics and Speech Processing (ROCLING 2015)

pdf bib
Chinese Grammatical Error Diagnosis by Conditional Random Fields
Po-Lin Chen | Shih-Hung Wu | Liang-Pu Chen | Ping-Che Yang | Ren-Dar Yang
Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications

2014

pdf bib
Modeling the Helpful Opinion Mining of Online Consumer Reviews as a Classification Problem
Yi-Ching Zeng | Tsun Ku | Shih-Hung Wu | Liang-Pu Chen | Gwo-Dong Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 19, Number 2, June 2014

2013

pdf bib
Modeling the Helpful Opinion Mining of Online Consumer Reviews as a Classification Problem
Yi-Ching Zeng | Shih-Hung Wu
Proceedings of the IJCNLP 2013 Workshop on Natural Language Processing for Social Media (SocialNLP)

pdf bib
Chinese Spelling Check Evaluation at SIGHAN Bake-off 2013
Shih-Hung Wu | Chao-Lin Liu | Lung-Hao Lee
Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing

pdf bib
蘊涵句型分析於改進中文文字蘊涵識別系統 (Entailment Analysis for Improving Chinese Recognizing Textual Entailment System) [In Chinese]
Shan-Shun Yang | Shih-Hung Wu | Liang-Pu Chen | Hung-Sheng Chiu | Ren-Dar Yang
Proceedings of the 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013)

pdf bib
蘊涵句型分析於改進中文文字蘊涵識別系統 (Entailment Analysis for Improving Chinese Recognizing Textual Entailment System) [In Chinese]
Shan-Shun Yang | Shih-Hung Wu | Liang-Pu Chen | Hung-Sheng Chiu | Ren-Dar Yang
International Journal of Computational Linguistics & Chinese Language Processing, Volume 18, Number 4, December 2013-Special Issue on Selected Papers from ROCLING XXV

2012

pdf bib
Sentence Parsing with Double Sequential Labeling in Traditional Chinese Parsing Task
Shih-Hung Wu | Hsien-You Hsieh | Liang-Pu Chen
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
基於單語言機器翻譯技術改進中文文字蘊涵 (Improving Chinese Textural Entailment by Monolingual Machine Translation Technology) [In Chinese]
Shan-Shun Yang | Shih-Hung Wu | Liang-Pu Chen | Wen-Tai Hsieh | Seng-Cho T. Chou
Proceedings of the 24th Conference on Computational Linguistics and Speech Processing (ROCLING 2012)

2011

pdf bib
中文文字蘊涵系統之特徵分析 (Feature Analysis of Chinese Textual Entailment System) [In Chinese]
Wan-Chi Huang | Shih-Hung Wu | Liang-Pu Chen | Tsun Ku
ROCLING 2011 Poster Papers

pdf bib
以籠統查詢評估查詢擴展方法與線上搜尋引擎之資訊檢索效能 (Evaluating the Information Retrieval Performance of Query Expansion Method and On-line Search Engine on General Query) [In Chinese]
Chih-Chuan Hsu | Shih-Hung Wu
International Journal of Computational Linguistics & Chinese Language Processing, Volume 16, Number 1-2, March/June 2011

2010

bib
Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing (ROCLING 2010)
Shih-Hung Wu | Jeih-weih Hung
Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing (ROCLING 2010)

pdf bib
基於對照表以及語言模型之簡繁字體轉換 (Chinese Characters Conversion System based on Lookup Table and Language Model) [In Chinese]
Min-Hsiang Li | Shih-Hung Wu | Ping-che Yang | Tsun Ku
Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing (ROCLING 2010)

pdf bib
基於對照表以及語言模型之簡繁字體轉換 (Chinese Characters Conversion System based on Lookup Table and Language Model) [In Chinese]
Min-Hsiang Li | Shih-Hung Wu | Yi-Ching Zeng | Ping-che Yang | Tsun Ku
International Journal of Computational Linguistics & Chinese Language Processing, Volume 15, Number 1, March 2010

pdf bib
Improving the Template Generation for Chinese Character Error Detection with Confusion Sets
Yong-Zhi Chen | Shih-Hung Wu | Ping-che Yang | Tsun Ku
International Journal of Computational Linguistics & Chinese Language Processing, Volume 15, Number 2, June 2010

pdf bib
Reducing the False Alarm Rate of Chinese Character Error Detection and Correction
Shih-Hung Wu | Yong-Zhi Chen | Ping-che Yang | Tsun Ku | Chao-Lin Liu
CIPS-SIGHAN Joint Conference on Chinese Language Processing

2009

pdf bib
Capturing Errors in Written Chinese Words
Chao-Lin Liu | Kan-Wen Tien | Min-Hua Lai | Yi-Hsuan Chuang | Shih-Hung Wu
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib
Phonological and Logographic Influences on Errors in Written Chinese Words
Chao-Lin Liu | Kan-Wen Tien | Min-Hua Lai | Yi-Hsuan Chuang | Shih-Hung Wu
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

pdf bib
中文混淆字集應用於別字偵錯模板自動產生 (Chinese Confusion Word Set for Automatic Generation of Spelling Error Detecting Template) [In Chinese]
Yong-Zhi Chen | Shih-Hung Wu | Chia-Ching Lu | Tsun Ku
ROCLING 2009 Poster Papers

2008

pdf bib
Automatic Wikibook Prototyping via Mining Wikipedia
Jen-Liang Chou | Shih-Hung Wu
International Journal of Computational Linguistics & Chinese Language Processing, Volume 13, Number 4, December 2008

2006

pdf bib
An Evaluation of Adopting Language Model as the Checker of Preposition Usage
Shih-Hung Wu | Chen-Yu Su
Proceedings of the 18th Conference on Computational Linguistics and Speech Processing

2005

pdf bib
Applying Maximum Entropy to Robust Chinese Shallow Parsing
Shih-Hung Wu | Cheng-Wei Shih | Chia-Wei Wu | Tzong-Han Tsai | Wen-Lian Hsu
Proceedings of the 17th Conference on Computational Linguistics and Speech Processing

2004

pdf bib
The Construction of a Chinese Named Entity Tagged Corpus: CNEC1.0
Cheng-Wei Shih | Tzong-Han Tsai | Shih-Hung Wu | Chiu-Chen Hsieh | Wen-Lian Hsu
Proceedings of the 16th Conference on Computational Linguistics and Speech Processing

pdf bib
Mencius: A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model
Tzong-Han Tsai | Shih-Hung Wu | Cheng-Wei Lee | Cheng-Wei Shih | Wen-Lian Hsu
International Journal of Computational Linguistics & Chinese Language Processing, Volume 9, Number 1, February 2004: Special Issue on Selected Papers from ROCLING XV

2003

pdf bib
Mencius: A Chinese Named Entity Recognizer Using Hybrid Model
Tzong-Han Tsai | Shih-Hung Wu | Wen-Lian Hsu
Proceedings of Research on Computational Linguistics Conference XV

pdf bib
Text Categorization Using Automatically Acquired Domain Ontology
Shih-Hung Wu | Tzong-Han Tsai | Wen-Lian Hsu
Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages

2002

pdf bib
SOAT: A Semi-Automatic Domain Ontology Acquisition Tool from Chinese Corpus
Shih-Hung Wu | Wen-Lian Hsu
COLING 2002: The 17th International Conference on Computational Linguistics: Project Notes

1998

pdf bib
An Assessment of Character-based Chinese News Filtering Using Latent Semantic Indexing
Shih-Hung Wu | Pey-Ching Yang | Von-Wun Soo
International Journal of Computational Linguistics & Chinese Language Processing, Volume 3, Number 2, August 1998

1997

pdf bib
An Assessment on Character-based Chinese News Filtering Using Latent Semantic Indexing
Shih-Hung Wu | Pey-Ching Yang | Von-Wun Soo
Proceedings of the 10th Research on Computational Linguistics International Conference