Exploring Prompting Large Language Models as Explainable Metrics

Ghazaleh Mahmoudi


Abstract
This paper describes the IUST NLP Lab submission to the Prompting Large Language Models as Explainable Metrics Shared Task at the Eval4NLP 2023 Workshop on Evaluation & Comparison of NLP Systems. We have proposed a zero-shot prompt-based strategy for explainable evaluation of the summarization task using Large Language Models (LLMs). The conducted experiments demonstrate the promising potential of LLMs as evaluation metrics in Natural Language Processing (NLP), particularly in the field of summarization. Both few-shot and zero-shot approaches are employed in these experiments. The performance of our best provided prompts achieved a Kendall correlation of 0.477 with human evaluations in the text summarization task on the test data.
Anthology ID:
2023.eval4nlp-1.18
Volume:
Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems
Month:
November
Year:
2023
Address:
Bali, Indonesia
Editors:
Daniel Deutsch, Rotem Dror, Steffen Eger, Yang Gao, Christoph Leiter, Juri Opitz, Andreas Rücklé
Venues:
Eval4NLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
219–227
Language:
URL:
https://aclanthology.org/2023.eval4nlp-1.18
DOI:
10.18653/v1/2023.eval4nlp-1.18
Bibkey:
Cite (ACL):
Ghazaleh Mahmoudi. 2023. Exploring Prompting Large Language Models as Explainable Metrics. In Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems, pages 219–227, Bali, Indonesia. Association for Computational Linguistics.
Cite (Informal):
Exploring Prompting Large Language Models as Explainable Metrics (Mahmoudi, Eval4NLP-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eval4nlp-1.18.pdf