LTRC_IIITH’s 2023 Submission for Prompting Large Language Models as Explainable Metrics Task

Pavan Baswani, Ananya Mukherjee, Manish Shrivastava


Abstract
In this report, we share our contribution to the Eval4NLP Shared Task titled “Prompting Large Language Models as Explainable Metrics.” We build our prompts with a primary focus on effective prompting strategies, score-aggregation, and explainability for LLM-based metrics. We participated in the track for smaller models by submitting the scores along with their explanations. According to the Kendall correlation scores on the leaderboard, our MT evaluation submission ranks second-best, while our summarization evaluation submission ranks fourth, with only a 0.06 difference from the leading submission.
Anthology ID:
2023.eval4nlp-1.13
Volume:
Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems
Month:
November
Year:
2023
Address:
Bali, Indonesia
Editors:
Daniel Deutsch, Rotem Dror, Steffen Eger, Yang Gao, Christoph Leiter, Juri Opitz, Andreas Rücklé
Venues:
Eval4NLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
156–163
Language:
URL:
https://aclanthology.org/2023.eval4nlp-1.13
DOI:
10.18653/v1/2023.eval4nlp-1.13
Bibkey:
Cite (ACL):
Pavan Baswani, Ananya Mukherjee, and Manish Shrivastava. 2023. LTRC_IIITH’s 2023 Submission for Prompting Large Language Models as Explainable Metrics Task. In Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems, pages 156–163, Bali, Indonesia. Association for Computational Linguistics.
Cite (Informal):
LTRC_IIITH’s 2023 Submission for Prompting Large Language Models as Explainable Metrics Task (Baswani et al., Eval4NLP-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eval4nlp-1.13.pdf