Yushan Qian


2023

Empathetic dialogue is an indispensable part of building harmonious social relationships and contributes to the development of a helpful AI. Previous approaches are mainly based on fine small-scale language models. With the advent of ChatGPT, the application effect of large language models (LLMs) in this field has attracted great attention. This work empirically investigates the performance of LLMs in generating empathetic responses and proposes three improvement methods of semantically similar in-context learning, two-stage interactive generation, and combination with the knowledge base. Extensive experiments show that LLMs can significantly benefit from our proposed methods and is able to achieve state-of-the-art performance in both automatic and human evaluations. Additionally, we explore the possibility of GPT-4 simulating human evaluators.
“From pre-trained language model (PLM) to large language model (LLM), the field of naturallanguage processing (NLP) has witnessed steep performance gains and wide practical uses. Theevaluation of a research field guides its direction of improvement. However, LLMs are extremelyhard to thoroughly evaluate for two reasons. First of all, traditional NLP tasks become inade-quate due to the excellent performance of LLM. Secondly, existing evaluation tasks are difficultto keep up with the wide range of applications in real-world scenarios. To tackle these problems,existing works proposed various benchmarks to better evaluate LLMs. To clarify the numerousevaluation tasks in both academia and industry, we investigate multiple papers concerning LLMevaluations. We summarize 4 core competencies of LLM, including reasoning, knowledge, relia-bility, and safety. For every competency, we introduce its definition, corresponding benchmarks,and metrics. Under this competency architecture, similar tasks are combined to reflect corre-sponding ability, while new tasks can also be easily added into the system. Finally, we give oursuggestions on the future direction of LLM’s evaluation.”