An Empirical Study of Many-to-Many Summarization with Large Language Models

Jiaan Wang; Fandong Meng; Zengkui Sun; Yunlong Liang; Yuxuan Cao; Jiarong Xu; Haoxiang Shi; Jie Zhou

doi:10.18653/v1/2025.acl-long.555

An Empirical Study of Many-to-Many Summarization with Large Language Models

Jiaan Wang, Fandong Meng, Zengkui Sun, Yunlong Liang, Yuxuan Cao, Jiarong Xu, Haoxiang Shi, Jie Zhou

Abstract

Many-to-many summarization (M2MS) aims to process documents in any language and generate the corresponding summaries also in any language. Recently, large language models (LLMs) have shown strong multi-lingual abilities, giving them the potential to perform M2MS in real applications. This work presents a systematic empirical study on LLMs’ M2MS ability. Specifically, we first reorganize M2MS data based on eight previous domain-specific datasets. The reorganized data contains 47.8K samples spanning five domains and six languages, which could be used to train and evaluate LLMs. Then, we benchmark 18 LLMs in a zero-shot manner and an instruction-tuning manner. Fine-tuned traditional models (e.g., mBART) are also conducted for comparisons. Our experiments reveal that, zero-shot LLMs achieve competitive results with fine-tuned traditional models. After instruct-tuning, open-source LLMs can significantly improve their M2MS ability, and outperform zero-shot LLMs (including GPT-4) in terms of automatic evaluations. In addition, we demonstrate this task-specific improvement does not sacrifice the LLMs’ general task-solving abilities. However, as revealed by our human evaluation, LLMs still face the factuality issue, and the instruction tuning might intensify the issue. Thus, how to control factual errors becomes the key when building LLM summarizers in real applications, and is worthy to be noted in future research.

Anthology ID:: 2025.acl-long.555
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11328–11344
Language:
URL:: https://aclanthology.org/2025.acl-long.555/
DOI:: 10.18653/v1/2025.acl-long.555
Bibkey:
Cite (ACL):: Jiaan Wang, Fandong Meng, Zengkui Sun, Yunlong Liang, Yuxuan Cao, Jiarong Xu, Haoxiang Shi, and Jie Zhou. 2025. An Empirical Study of Many-to-Many Summarization with Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11328–11344, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: An Empirical Study of Many-to-Many Summarization with Large Language Models (Wang et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.555.pdf

PDF Cite Search Fix data