Characterised LLMs Affect its Evaluation of Summary and Translation

Yuan Lu; Yu-Ting Lin

doi:10.18653/v1/2023.eval4nlp-1.15

Characterised LLMs Affect its Evaluation of Summary and Translation

Abstract

In today’s widespread use of Large Language Models (LLMs), there have been significant achievements in various text domains such as generating summaries and translations. However, there is still room for development and improvement in evaluating the outputs of LLMs. In this paper, we propose an innovative scoring system that assesses the quality of summaries and translations using multiple metrics, we also enhance LLM’s performance in scoring tasks by assigning it different roles, effectively making it act as an expert. We test four roles in the study: a teacher, a proofreader, a travel writer, and an internet troll, comparing the advantages and disadvantages of each role in the scoring task. Our research results demonstrate that emphasizing LLM’s multilingual capabilities and strict standards as its identity can effectively boost its performance. Additionally, imbuing LLM with a more critical thinking ability enhances its performance in translation tasks compared to a milder LLM identity. In summary, we show that assigning different identities to LLM can influence its performance in scoring tasks. We believe that this research will contribute to the use of LLMs for scoring purposes.

Anthology ID:: 2023.eval4nlp-1.15
Volume:: Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems
Month:: November
Year:: 2023
Address:: Bali, Indonesia
Editors:: Daniel Deutsch, Rotem Dror, Steffen Eger, Yang Gao, Christoph Leiter, Juri Opitz, Andreas Rücklé
Venues:: Eval4NLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 184–192
Language:
URL:: https://aclanthology.org/2023.eval4nlp-1.15
DOI:: 10.18653/v1/2023.eval4nlp-1.15
Bibkey:
Cite (ACL):: Yuan Lu and Yu-Ting Lin. 2023. Characterised LLMs Affect its Evaluation of Summary and Translation. In Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems, pages 184–192, Bali, Indonesia. Association for Computational Linguistics.
Cite (Informal):: Characterised LLMs Affect its Evaluation of Summary and Translation (Lu & Lin, Eval4NLP-WS 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.eval4nlp-1.15.pdf

PDF Cite Search