T5Score: A Methodology for Automatically Assessing the Quality of LLM Generated Multi-Document Topic Sets

Itamar Trainin; Omri Abend

doi:10.18653/v1/2025.findings-acl.1351

T⁵Score: A Methodology for Automatically Assessing the Quality of LLM Generated Multi-Document Topic Sets

Abstract

Using LLMs for Multi-Document Topic Extraction has recently gained popularity due to their apparent high-quality outputs, expressiveness, and ease of use. However, most existing evaluation practices are not designed for LLM-generated topics and result in low inter-annotator agreement scores, hindering the reliable use of LLMs for the task. To address this, we introduce T⁵Score, an evaluation methodology that decomposes the quality of a topic set into quantifiable aspects, measurable through easy-to-perform annotation tasks. This framing enables a convenient, manual or automatic, evaluation procedure resulting in a strong inter-annotator agreement score.To substantiate our methodology and claims, we perform extensive experimentation on multiple datasets and report the results.

Anthology ID:: 2025.findings-acl.1351
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26347–26375
Language:
URL:: https://aclanthology.org/2025.findings-acl.1351/
DOI:: 10.18653/v1/2025.findings-acl.1351
Bibkey:
Cite (ACL):: Itamar Trainin and Omri Abend. 2025. T5Score: A Methodology for Automatically Assessing the Quality of LLM Generated Multi-Document Topic Sets. In Findings of the Association for Computational Linguistics: ACL 2025, pages 26347–26375, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: T5Score: A Methodology for Automatically Assessing the Quality of LLM Generated Multi-Document Topic Sets (Trainin & Abend, Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.1351.pdf

PDF Cite Search Fix data