BanglaCHQ-Summ: An Abstractive Summarization Dataset for Medical Queries in Bangla Conversational Speech

Alvi Khan, Fida Kamal, Mohammad Abrar Chowdhury, Tasnim Ahmed, Md Tahmid Rahman Laskar, Sabbir Ahmed


Abstract
Online health consultation is steadily gaining popularity as a platform for patients to discuss their medical health inquiries, known as Consumer Health Questions (CHQs). The emergence of the COVID-19 pandemic has also led to a surge in the use of such platforms, creating a significant burden for the limited number of healthcare professionals attempting to respond to the influx of questions. Abstractive text summarization is a promising solution to this challenge, since shortening CHQs to only the information essential to answering them reduces the amount of time spent parsing unnecessary information. The summarization process can also serve as an intermediate step towards the eventual development of an automated medical question-answering system. This paper presents ‘BanglaCHQ-Summ’, the first CHQ summarization dataset for the Bangla language, consisting of 2,350 question-summary pairs. It is benchmarked on state-of-the-art Bangla and multilingual text generation models, with the best-performing model, BanglaT5, achieving a ROUGE-L score of 48.35%. In addition, we address the limitations of existing automatic metrics for summarization by conducting a human evaluation. The dataset and all relevant code used in this work have been made publicly available.
Anthology ID:
2023.banglalp-1.10
Volume:
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)
Month:
December
Year:
2023
Address:
Singapore
Editors:
Firoj Alam, Sudipta Kar, Shammur Absar Chowdhury, Farig Sadeque, Ruhul Amin
Venue:
BanglaLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
85–93
Language:
URL:
https://aclanthology.org/2023.banglalp-1.10
DOI:
10.18653/v1/2023.banglalp-1.10
Bibkey:
Cite (ACL):
Alvi Khan, Fida Kamal, Mohammad Abrar Chowdhury, Tasnim Ahmed, Md Tahmid Rahman Laskar, and Sabbir Ahmed. 2023. BanglaCHQ-Summ: An Abstractive Summarization Dataset for Medical Queries in Bangla Conversational Speech. In Proceedings of the First Workshop on Bangla Language Processing (BLP-2023), pages 85–93, Singapore. Association for Computational Linguistics.
Cite (Informal):
BanglaCHQ-Summ: An Abstractive Summarization Dataset for Medical Queries in Bangla Conversational Speech (Khan et al., BanglaLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.banglalp-1.10.pdf
Video:
 https://aclanthology.org/2023.banglalp-1.10.mp4