Word Matters: What Influences Domain Adaptation in Summarization?

Yinghao Li; Siyu Miao; He-Yan Huang; Yang Gao (扬 高)

doi:10.18653/v1/2024.acl-long.715

Word Matters: What Influences Domain Adaptation in Summarization?

Yinghao Li, Siyu Miao, Heyan Huang, Yang Gao

Abstract

Domain adaptation aims to enable Large Language Models (LLMs) to generalize domain datasets unseen effectively during the training phase. However, factors such as the size of the model parameters and the scale of training data are general influencers and do not reflect the nuances of domain adaptation performance. This paper investigates the fine-grained factors affecting domain adaptation performance, analyzing the specific impact of ‘words’ in training data on summarization tasks. We propose quantifying dataset learning difficulty as the learning difficulty of generative summarization, which is determined by two indicators: word-based compression rate and abstraction level. Our experiments conclude that, when considering dataset learning difficulty, the cross-domain overlap and the performance gain in summarization tasks exhibit an approximate linear relationship, which is not directly related to the number of words. Based on this finding, predicting a model’s performance on unknown domain datasets is possible without undergoing training. Source code and scripts are available at https://github.com/li-aolong/Word-Matters.

Anthology ID:: 2024.luhme-long.715
Volume:: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13236–13249
Language:
URL:: https://aclanthology.org/2024.luhme-long.715/
DOI:: 10.18653/v1/2024.acl-long.715
Bibkey:
Cite (ACL):: Yinghao Li, Siyu Miao, Heyan Huang, and Yang Gao. 2024. Word Matters: What Influences Domain Adaptation in Summarization?. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13236–13249, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Word Matters: What Influences Domain Adaptation in Summarization? (Li et al., ACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.acl-long.715.pdf

PDF Cite Search Fix data