LASS: A Novel and Economical Data Augmentation Framework Based on Language Models for Debiasing Opinion Summarization

Yanyue Zhang, Pengfei Li, Yilong Lai, Yulan He, Deyu Zhou


Abstract
As more than 70% of reviews in the existing opinion summary data set are positive, current opinion summarization approaches are hesitant to generate negative summaries given the input of negative texts. To address such sentiment bias, a direct approach without the reliance on a specific structure is to generate additional data based on large language models to balance the emotional distribution of the dataset. However, large-scale data augmentation based on large language models faces an apparent disadvantage, the expensive costs. Therefore, in this paper, we propose LASS, a novel data augmentation framework based on both LArge and Small language models for debiaSing opinion summarization. Specifically, a small number of synthesized negative reviews is obtained by rewriting the positive text via a large language model. Then, a disentangle reconstruction model is trained based on the generated data. After training, a large amount of synthetic data can be obtained by decoding the new representation obtained from the combination of different sample representations and filtering based on perplexity degree and sentiment classification. Experiments have proved that LASS can effectively alleviate emotional bias, similar to using only large models, but in a more economical way.
Anthology ID:
2025.coling-main.412
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6169–6183
Language:
URL:
https://aclanthology.org/2025.coling-main.412/
DOI:
Bibkey:
Cite (ACL):
Yanyue Zhang, Pengfei Li, Yilong Lai, Yulan He, and Deyu Zhou. 2025. LASS: A Novel and Economical Data Augmentation Framework Based on Language Models for Debiasing Opinion Summarization. In Proceedings of the 31st International Conference on Computational Linguistics, pages 6169–6183, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
LASS: A Novel and Economical Data Augmentation Framework Based on Language Models for Debiasing Opinion Summarization (Zhang et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.412.pdf