Argument Summarization and its Evaluation in the Era of Large Language Models

Moritz Altemeyer; Steffen Eger; Johannes Daxenberger; Yanran Chen; Tim Altendorf; Philipp Cimiano; Benjamin Schiller

doi:10.18653/v1/2025.emnlp-main.1797

Argument Summarization and its Evaluation in the Era of Large Language Models

Moritz Altemeyer, Steffen Eger, Johannes Daxenberger, Yanran Chen, Tim Altendorf, Philipp Cimiano, Benjamin Schiller

Abstract

Large Language Models (LLMs) have revolutionized various Natural Language Generation (NLG) tasks, including Argument Summarization (ArgSum), a key subfield of Argument Mining. This paper investigates the integration of state-of-the-art LLMs into ArgSum systems and their evaluation. In particular, we propose a novel prompt-based evaluation scheme, and validate it through a novel human benchmark dataset. Our work makes three main contributions: (i) the integration of LLMs into existing ArgSum systems, (ii) the development of two new LLM-based ArgSum systems, benchmarked against prior methods, and (iii) the introduction of an advanced LLM-based evaluation scheme. We demonstrate that the use of LLMs substantially improves both the generation and evaluation of argument summaries, achieving state-of-the-art results and advancing the field of ArgSum. We also show that among the four LLMs integrated in (i) and (ii), Qwen-3-32B, despite having the fewest parameters, performs best, even surpassing GPT-4o.

Anthology ID:: 2025.emnlp-main.1797
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 35490–35511
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1797/
DOI:: 10.18653/v1/2025.emnlp-main.1797
Bibkey:
Cite (ACL):: Moritz Altemeyer, Steffen Eger, Johannes Daxenberger, Yanran Chen, Tim Altendorf, Philipp Cimiano, and Benjamin Schiller. 2025. Argument Summarization and its Evaluation in the Era of Large Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 35490–35511, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Argument Summarization and its Evaluation in the Era of Large Language Models (Altemeyer et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1797.pdf
Checklist:: 2025.emnlp-main.1797.checklist.pdf

PDF Cite Search Checklist Fix data