A LLM-based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation

Irune Zubiaga; Aitor Soroa; Rodrigo Agerri

A LLM-based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation

Irune Zubiaga, Aitor Soroa, Rodrigo Agerri

Abstract

This paper proposes a novel approach to evaluate Counter Narrative (CN) generation using a Large Language Model (LLM) as an evaluator. We show that traditional automatic metrics correlate poorly with human judgements and fail to capture the nuanced relationship between generated CNs and human perception. To alleviate this, we introduce a model ranking pipeline based on pairwise comparisons of generated CNs from different models, organized in a tournament-style format. The proposed evaluation method achieves a high correlation with human preference, with a ρ score of 0.88. As an additional contribution, we leverage LLMs as zero-shot CN generators and provide a comparative analysis of chat, instruct, and base models, exploring their respective strengths and limitations. Through meticulous evaluation, including fine-tuning experiments, we elucidate the differences in performance and responsiveness to domain-specific data. We conclude that chat-aligned models in zero-shot are the best option for carrying out the task, provided they do not refuse to generate an answer due to security concerns.

Anthology ID:: 2024.findings-emnlp.559
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9572–9585
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.559
DOI:
Bibkey:
Cite (ACL):: Irune Zubiaga, Aitor Soroa, and Rodrigo Agerri. 2024. A LLM-based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 9572–9585, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: A LLM-based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation (Zubiaga et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-emnlp.559.pdf

PDF Cite Search