2024
pdf
bib
abs
TelBench: A Benchmark for Evaluating Telco-Specific Large Language Models
Sunwoo Lee
|
Dhammiko Arya
|
Seung-Mo Cho
|
Gyoung-eun Han
|
Seokyoung Hong
|
Wonbeom Jang
|
Seojin Lee
|
Sohee Park
|
Sereimony Sek
|
Injee Song
|
Sungbin Yoon
|
Eric Davis
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
The telecommunications industry, characterized by its vast customer base and complex service offerings, necessitates a high level of domain expertise and proficiency in customer service center operations. Consequently, there is a growing demand for Large Language Models (LLMs) to augment the capabilities of customer service representatives. This paper introduces a methodology for developing a specialized Telecommunications LLM (Telco LLM) designed to enhance the efficiency of customer service agents and promote consistency in service quality across representatives. We present the construction process of TelBench, a novel dataset created for performance evaluation of customer service expertise in the telecommunications domain. We also evaluate various LLMs and demonstrate the ability to benchmark both proprietary and open-source LLMs on predefined telecommunications-related tasks, thereby establishing metrics that define telcommunications performance.
2023
pdf
bib
abs
What, When, and How to Ground: Designing User Persona-Aware Conversational Agents for Engaging Dialogue
Deuksin Kwon
|
Sunwoo Lee
|
Ki Hyun Kim
|
Seojin Lee
|
Taeyoon Kim
|
Eric Davis
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
This paper presents a method for building a personalized open-domain dialogue system to address the WWH (WHAT, WHEN, and HOW) problem for natural response generation in a commercial setting, where personalized dialogue responses are heavily interleaved with casual response turns. The proposed approach involves weighted dataset blending, negative persona information augmentation methods, and the design of personalized conversation datasets to address the challenges of WWH in personalized, open-domain dialogue systems. Our work effectively balances dialogue fluency and tendency to ground, while also introducing a response-type label to improve the controllability and explainability of the grounded responses. The combination of these methods leads to more fluent conversations, as evidenced by subjective human evaluations as well as objective evaluations.
2022
pdf
bib
abs
KoBEST: Korean Balanced Evaluation of Significant Tasks
Myeongjun Jang
|
Dohyung Kim
|
Deuk Sin Kwon
|
Eric Davis
Proceedings of the 29th International Conference on Computational Linguistics
A well-formulated benchmark plays a critical role in spurring advancements in the natural language processing (NLP) field, as it allows objective and precise evaluation of diverse models. As modern language models (LMs) have become more elaborate and sophisticated, more difficult benchmarks that require linguistic knowledge and reasoning have been proposed. However, most of these benchmarks only support English, and great effort is necessary to construct benchmarks for other low resource languages. To this end, we propose a new benchmark named Korean balanced evaluation of significant tasks (KoBEST), which consists of five Korean-language downstream tasks. Professional Korean linguists designed the tasks that require advanced Korean linguistic knowledge. Moreover, our data is purely annotated by humans and thoroughly reviewed to guarantee high data quality. We also provide baseline models and human performance results. Our dataset is available on the Huggingface.
2007
pdf
bib
High-accuracy Annotation and Parsing of CHILDES Transcripts
Kenji Sagae
|
Eric Davis
|
Alon Lavie
|
Brian MacWhinney
|
Shuly Wintner
Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition