Ryota Arashi


2026

To address challenges in objectivity and efficiency in evaluating the quality of generative AI chatbots, we developed an automatic evaluation framework using the "LLM-as-a-judge" approach. A User Simulator, built with In-Context Learning and LoRA tuning, was employed to generate pseudo-conversation logs of the fan-engagement application OSHIAI. These logs were then automatically evaluated by a Judge LLM across six dimensions, and the contribution of this method to quality management in real-world services was verified.