Sheng-Lun Chien
2020
Learning the Human Judgment for the Automatic Evaluation of Chatbot
Shih-Hung Wu
|
Sheng-Lun Chien
Proceedings of the Twelfth Language Resources and Evaluation Conference
It is hard to evaluate the quality of the generated text by a generative dialogue system. Currently, dialogue evaluation relies on human judges to label the quality of the generated text. It is not a reusable mechanism that can give consistent evaluation for system developers. We believe that it is easier to get consistent results on comparing two generated dialogue by two systems and it is hard to give a consistent quality score on only one system at a time. In this paper, we propose a machine learning approach to reduce the effort of human evaluation by learning the human judgment on comparing two dialogue systems. Training from the human labeling result, the evaluation model learns which generative models is better in each dialog context. Thus, it can be used for system developers to compare the fine-tuned models over and over again without the human labor. In our experiment we find the agreement between the learned model and human judge is 70%. The experiment is conducted on comparing two attention based GRU-RNN generative models.
2019
基於Seq2Seq模型的中文文法錯誤診斷系統(A Chinese Grammatical Error Diagnosis System Based on Seq2Seq Model)
Jun-Wei Wang
|
Sheng-Lun Chien
|
Yi-Kun Chen
|
Shih-Hung Wu
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)