Development of an Evaluation System for a Fan-Engagement Chat Application Using LLM-as-a-Judge

Yuki Fujita, Yasunobu Sasaki, Ryota Arashi, Hokuto Ototake, Shinya Takahashi


Abstract
To address challenges in objectivity and efficiency in evaluating the quality of generative AI chatbots, we developed an automatic evaluation framework using the "LLM-as-a-judge" approach. A User Simulator, built with In-Context Learning and LoRA tuning, was employed to generate pseudo-conversation logs of the fan-engagement application OSHIAI. These logs were then automatically evaluated by a Judge LLM across six dimensions, and the contribution of this method to quality management in real-world services was verified.
Anthology ID:
2026.iwsds-1.13
Volume:
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology
Month:
February
Year:
2026
Address:
Trento, Italy
Editors:
Giuseppe Riccardi, Seyed Mahed Mousavi, Maria Ines Torres, Koichiro Yoshino, Zoraida Callejas, Shammur Absar Chowdhury, Yun-Nung Chen, Frederic Bechet, Joakim Gustafson, Géraldine Damnati, Alex Papangelis, Luis Fernando D’Haro, John Mendonça, Raffaella Bernardi, Dilek Hakkani-Tur, Giuseppe "Pino" Di Fabbrizio, Tatsuya Kawahara, Firoj Alam, Gokhan Tur, Michael Johnston
Venue:
IWSDS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
123–127
Language:
URL:
https://aclanthology.org/2026.iwsds-1.13/
DOI:
Bibkey:
Cite (ACL):
Yuki Fujita, Yasunobu Sasaki, Ryota Arashi, Hokuto Ototake, and Shinya Takahashi. 2026. Development of an Evaluation System for a Fan-Engagement Chat Application Using LLM-as-a-Judge. In Proceedings of the 16th International Workshop on Spoken Dialogue System Technology, pages 123–127, Trento, Italy. Association for Computational Linguistics.
Cite (Informal):
Development of an Evaluation System for a Fan-Engagement Chat Application Using LLM-as-a-Judge (Fujita et al., IWSDS 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.iwsds-1.13.pdf