Towards Summarizing Healthcare Questions in Low-Resource Setting

Shweta Yadav, Cornelia Caragea


Abstract
The current advancement in abstractive document summarization depends to a large extent on a considerable amount of human-annotated datasets. However, the creation of large-scale datasets is often not feasible in closed domains, such as medical and healthcare domains, where human annotation requires domain expertise. This paper presents a novel data selection strategy to generate diverse and semantic questions in a low-resource setting with the aim to summarize healthcare questions. Our method exploits the concept of guided semantic-overlap and diversity-based objective functions to optimally select the informative and diverse set of synthetic samples for data augmentation. Our extensive experiments on benchmark healthcare question summarization datasets demonstrate the effectiveness of our proposed data selection strategy by achieving new state-of-the-art results. Our human evaluation shows that our method generates diverse, fluent, and informative summarized questions.
Anthology ID:
2022.coling-1.255
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
2892–2905
Language:
URL:
https://aclanthology.org/2022.coling-1.255
DOI:
Bibkey:
Cite (ACL):
Shweta Yadav and Cornelia Caragea. 2022. Towards Summarizing Healthcare Questions in Low-Resource Setting. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2892–2905, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Towards Summarizing Healthcare Questions in Low-Resource Setting (Yadav & Caragea, COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.255.pdf