KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark

Seongbo Jang, Seonghyeon Lee, Hwanjo Yu


Abstract
As language models are often deployed as chatbot assistants, it becomes a virtue for models to engage in conversations in a user’s first language. While these models are trained on a wide range of languages, a comprehensive evaluation of their proficiency in low-resource languages such as Korean has been lacking. In this work, we introduce KoDialogBench, a benchmark designed to assess language models’ conversational capabilities in Korean. To this end, we collect native Korean dialogues on daily topics from public sources, or translate dialogues from other languages. We then structure these conversations into diverse test datasets, spanning from dialogue comprehension to response selection tasks. Leveraging the proposed benchmark, we conduct extensive evaluations and analyses of various language models to measure a foundational understanding of Korean dialogues. Experimental results indicate that there exists significant room for improvement in models’ conversation skills. Furthermore, our in-depth comparisons across different language models highlight the effectiveness of recent training techniques in enhancing conversational proficiency. We anticipate that KoDialogBench will promote the progress towards conversation-aware Korean language models.
Anthology ID:
2024.lrec-main.865
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
9905–9925
Language:
URL:
https://aclanthology.org/2024.lrec-main.865
DOI:
Bibkey:
Cite (ACL):
Seongbo Jang, Seonghyeon Lee, and Hwanjo Yu. 2024. KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 9905–9925, Torino, Italia. ELRA and ICCL.
Cite (Informal):
KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark (Jang et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.865.pdf