Ryota Arashi
2026
Development of an Evaluation System for a Fan-Engagement Chat Application Using LLM-as-a-Judge
Yuki Fujita | Yasunobu Sasaki | Ryota Arashi | Hokuto Ototake | Shinya Takahashi
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology
Yuki Fujita | Yasunobu Sasaki | Ryota Arashi | Hokuto Ototake | Shinya Takahashi
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology
To address challenges in objectivity and efficiency in evaluating the quality of generative AI chatbots, we developed an automatic evaluation framework using the "LLM-as-a-judge" approach. A User Simulator, built with In-Context Learning and LoRA tuning, was employed to generate pseudo-conversation logs of the fan-engagement application OSHIAI. These logs were then automatically evaluated by a Judge LLM across six dimensions, and the contribution of this method to quality management in real-world services was verified.