Analyzing the Dialect Diversity in Multi-document Summaries

Olubusayo Olabisi, Aaron Hudson, Antonie Jetter, Ameeta Agrawal


Abstract
Social media posts provide a compelling, yet challenging source of data of diverse perspectives from many socially salient groups. Automatic text summarization algorithms make this data accessible at scale by compressing large collections of documents into short summaries that preserve salient information from the source text. In this work, we take a complementary approach to analyzing and improving the quality of summaries generated from social media data in terms of their ability to represent salient as well as diverse perspectives. We introduce a novel dataset, DivSumm, of dialect diverse tweets and human-written extractive and abstractive summaries. Then, we study the extent of dialect diversity reflected in human-written reference summaries as well as system-generated summaries. The results of our extensive experiments suggest that humans annotate fairly well-balanced dialect diverse summaries, and that cluster-based pre-processing approaches seem beneficial in improving the overall quality of the system-generated summaries without loss in diversity.
Anthology ID:
2022.coling-1.542
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
6208–6221
Language:
URL:
https://aclanthology.org/2022.coling-1.542
DOI:
Bibkey:
Cite (ACL):
Olubusayo Olabisi, Aaron Hudson, Antonie Jetter, and Ameeta Agrawal. 2022. Analyzing the Dialect Diversity in Multi-document Summaries. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6208–6221, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Analyzing the Dialect Diversity in Multi-document Summaries (Olabisi et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.542.pdf
Code
 portnlp/divsumm