Development of an Evaluation System for a Fan-Engagement Chat Application Using LLM-as-a-Judge
Yuki Fujita, Yasunobu Sasaki, Ryota Arashi, Hokuto Ototake, Shinya Takahashi
Correct Metadata for
Abstract
To address challenges in objectivity and efficiency in evaluating the quality of generative AI chatbots, we developed an automatic evaluation framework using the "LLM-as-a-judge" approach. A User Simulator, built with In-Context Learning and LoRA tuning, was employed to generate pseudo-conversation logs of the fan-engagement application OSHIAI. These logs were then automatically evaluated by a Judge LLM across six dimensions, and the contribution of this method to quality management in real-world services was verified.- Anthology ID:
- 2026.iwsds-1.13
- Volume:
- Proceedings of the 16th International Workshop on Spoken Dialogue System Technology
- Month:
- February
- Year:
- 2026
- Address:
- Trento, Italy
- Editors:
- Giuseppe Riccardi, Seyed Mahed Mousavi, Maria Ines Torres, Koichiro Yoshino, Zoraida Callejas, Shammur Absar Chowdhury, Yun-Nung Chen, Frederic Bechet, Joakim Gustafson, Géraldine Damnati, Alex Papangelis, Luis Fernando D’Haro, John Mendonça, Raffaella Bernardi, Dilek Hakkani-Tur, Giuseppe "Pino" Di Fabbrizio, Tatsuya Kawahara, Firoj Alam, Gokhan Tur, Michael Johnston
- Venue:
- IWSDS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 123–127
- Language:
- URL:
- https://aclanthology.org/2026.iwsds-1.13/
- DOI:
- Bibkey:
- Cite (ACL):
- Yuki Fujita, Yasunobu Sasaki, Ryota Arashi, Hokuto Ototake, and Shinya Takahashi. 2026. Development of an Evaluation System for a Fan-Engagement Chat Application Using LLM-as-a-Judge. In Proceedings of the 16th International Workshop on Spoken Dialogue System Technology, pages 123–127, Trento, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Development of an Evaluation System for a Fan-Engagement Chat Application Using LLM-as-a-Judge (Fujita et al., IWSDS 2026)
- Copy Citation:
- PDF:
- https://aclanthology.org/2026.iwsds-1.13.pdf
Export citation
@inproceedings{fujita-etal-2026-development,
title = "Development of an Evaluation System for a Fan-Engagement Chat Application Using {LLM}-as-a-Judge",
author = "Fujita, Yuki and
Sasaki, Yasunobu and
Arashi, Ryota and
Ototake, Hokuto and
Takahashi, Shinya",
editor = "Riccardi, Giuseppe and
Mousavi, Seyed Mahed and
Torres, Maria Ines and
Yoshino, Koichiro and
Callejas, Zoraida and
Chowdhury, Shammur Absar and
Chen, Yun-Nung and
Bechet, Frederic and
Gustafson, Joakim and
Damnati, G{\'e}raldine and
Papangelis, Alex and
D{'}Haro, Luis Fernando and
Mendon{\c{c}}a, John and
Bernardi, Raffaella and
Hakkani-Tur, Dilek and
Di Fabbrizio, Giuseppe {''}Pino{''} and
Kawahara, Tatsuya and
Alam, Firoj and
Tur, Gokhan and
Johnston, Michael",
booktitle = "Proceedings of the 16th International Workshop on Spoken Dialogue System Technology",
month = feb,
year = "2026",
address = "Trento, Italy",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2026.iwsds-1.13/",
pages = "123--127",
abstract = "To address challenges in objectivity and efficiency in evaluating the quality of generative {AI} chatbots, we developed an automatic evaluation framework using the ``{LLM}-as-a-judge'' approach. A User Simulator, built with In-Context Learning and {L}o{RA} tuning, was employed to generate pseudo-conversation logs of the fan-engagement application {OSHIAI}. These logs were then automatically evaluated by a Judge {LLM} across six dimensions, and the contribution of this method to quality management in real-world services was verified."
}<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="fujita-etal-2026-development">
<titleInfo>
<title>Development of an Evaluation System for a Fan-Engagement Chat Application Using LLM-as-a-Judge</title>
</titleInfo>
<name type="personal">
<namePart type="given">Yuki</namePart>
<namePart type="family">Fujita</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Yasunobu</namePart>
<namePart type="family">Sasaki</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ryota</namePart>
<namePart type="family">Arashi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Hokuto</namePart>
<namePart type="family">Ototake</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Shinya</namePart>
<namePart type="family">Takahashi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2026-02</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<relatedItem type="host">
<titleInfo>
<title>Proceedings of the 16th International Workshop on Spoken Dialogue System Technology</title>
</titleInfo>
<name type="personal">
<namePart type="given">Giuseppe</namePart>
<namePart type="family">Riccardi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Seyed</namePart>
<namePart type="given">Mahed</namePart>
<namePart type="family">Mousavi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Maria</namePart>
<namePart type="given">Ines</namePart>
<namePart type="family">Torres</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Koichiro</namePart>
<namePart type="family">Yoshino</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Zoraida</namePart>
<namePart type="family">Callejas</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Shammur</namePart>
<namePart type="given">Absar</namePart>
<namePart type="family">Chowdhury</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Yun-Nung</namePart>
<namePart type="family">Chen</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Frederic</namePart>
<namePart type="family">Bechet</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Joakim</namePart>
<namePart type="family">Gustafson</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Géraldine</namePart>
<namePart type="family">Damnati</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Alex</namePart>
<namePart type="family">Papangelis</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Luis</namePart>
<namePart type="given">Fernando</namePart>
<namePart type="family">D’Haro</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">John</namePart>
<namePart type="family">Mendonça</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Raffaella</namePart>
<namePart type="family">Bernardi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Dilek</namePart>
<namePart type="family">Hakkani-Tur</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Giuseppe</namePart>
<namePart type="given">”Pino”</namePart>
<namePart type="family">Di Fabbrizio</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Tatsuya</namePart>
<namePart type="family">Kawahara</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Firoj</namePart>
<namePart type="family">Alam</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Gokhan</namePart>
<namePart type="family">Tur</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Michael</namePart>
<namePart type="family">Johnston</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<publisher>Association for Computational Linguistics</publisher>
<place>
<placeTerm type="text">Trento, Italy</placeTerm>
</place>
</originInfo>
<genre authority="marcgt">conference publication</genre>
</relatedItem>
<abstract>To address challenges in objectivity and efficiency in evaluating the quality of generative AI chatbots, we developed an automatic evaluation framework using the “LLM-as-a-judge” approach. A User Simulator, built with In-Context Learning and LoRA tuning, was employed to generate pseudo-conversation logs of the fan-engagement application OSHIAI. These logs were then automatically evaluated by a Judge LLM across six dimensions, and the contribution of this method to quality management in real-world services was verified.</abstract>
<identifier type="citekey">fujita-etal-2026-development</identifier>
<location>
<url>https://aclanthology.org/2026.iwsds-1.13/</url>
</location>
<part>
<date>2026-02</date>
<extent unit="page">
<start>123</start>
<end>127</end>
</extent>
</part>
</mods>
</modsCollection>
%0 Conference Proceedings %T Development of an Evaluation System for a Fan-Engagement Chat Application Using LLM-as-a-Judge %A Fujita, Yuki %A Sasaki, Yasunobu %A Arashi, Ryota %A Ototake, Hokuto %A Takahashi, Shinya %Y Riccardi, Giuseppe %Y Mousavi, Seyed Mahed %Y Torres, Maria Ines %Y Yoshino, Koichiro %Y Callejas, Zoraida %Y Chowdhury, Shammur Absar %Y Chen, Yun-Nung %Y Bechet, Frederic %Y Gustafson, Joakim %Y Damnati, Géraldine %Y Papangelis, Alex %Y D’Haro, Luis Fernando %Y Mendonça, John %Y Bernardi, Raffaella %Y Hakkani-Tur, Dilek %Y Di Fabbrizio, Giuseppe ”Pino” %Y Kawahara, Tatsuya %Y Alam, Firoj %Y Tur, Gokhan %Y Johnston, Michael %S Proceedings of the 16th International Workshop on Spoken Dialogue System Technology %D 2026 %8 February %I Association for Computational Linguistics %C Trento, Italy %F fujita-etal-2026-development %X To address challenges in objectivity and efficiency in evaluating the quality of generative AI chatbots, we developed an automatic evaluation framework using the “LLM-as-a-judge” approach. A User Simulator, built with In-Context Learning and LoRA tuning, was employed to generate pseudo-conversation logs of the fan-engagement application OSHIAI. These logs were then automatically evaluated by a Judge LLM across six dimensions, and the contribution of this method to quality management in real-world services was verified. %U https://aclanthology.org/2026.iwsds-1.13/ %P 123-127
Markdown (Informal)
[Development of an Evaluation System for a Fan-Engagement Chat Application Using LLM-as-a-Judge](https://aclanthology.org/2026.iwsds-1.13/) (Fujita et al., IWSDS 2026)
- Development of an Evaluation System for a Fan-Engagement Chat Application Using LLM-as-a-Judge (Fujita et al., IWSDS 2026)
ACL
- Yuki Fujita, Yasunobu Sasaki, Ryota Arashi, Hokuto Ototake, and Shinya Takahashi. 2026. Development of an Evaluation System for a Fan-Engagement Chat Application Using LLM-as-a-Judge. In Proceedings of the 16th International Workshop on Spoken Dialogue System Technology, pages 123–127, Trento, Italy. Association for Computational Linguistics.