Towards Summarizing Healthcare Questions in Low-Resource Setting

Shweta Yadav; Cornelia Caragea

Towards Summarizing Healthcare Questions in Low-Resource Setting

Abstract

The current advancement in abstractive document summarization depends to a large extent on a considerable amount of human-annotated datasets. However, the creation of large-scale datasets is often not feasible in closed domains, such as medical and healthcare domains, where human annotation requires domain expertise. This paper presents a novel data selection strategy to generate diverse and semantic questions in a low-resource setting with the aim to summarize healthcare questions. Our method exploits the concept of guided semantic-overlap and diversity-based objective functions to optimally select the informative and diverse set of synthetic samples for data augmentation. Our extensive experiments on benchmark healthcare question summarization datasets demonstrate the effectiveness of our proposed data selection strategy by achieving new state-of-the-art results. Our human evaluation shows that our method generates diverse, fluent, and informative summarized questions.

Anthology ID:: 2022.coling-1.255
Volume:: Proceedings of the 29th International Conference on Computational Linguistics
Month:: October
Year:: 2022
Address:: Gyeongju, Republic of Korea
Editors:: Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 2892–2905
Language:
URL:: https://aclanthology.org/2022.coling-1.255/
DOI:
Bibkey:
Cite (ACL):: Shweta Yadav and Cornelia Caragea. 2022. Towards Summarizing Healthcare Questions in Low-Resource Setting. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2892–2905, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):: Towards Summarizing Healthcare Questions in Low-Resource Setting (Yadav & Caragea, COLING 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.coling-1.255.pdf

PDF Cite Search Fix data