Unsupervised Bengali Text Summarization Using Sentence Embedding and Spectral Clustering

Sohini Roychowdhury, Kamal Sarkar, Arka Maji


Abstract
Single document extractive text summarization produces a condensed version of a document by extracting salient sentences from the document. Most significant and diverse information can be obtained from a document by breaking it into topical clusters of sentences. The spectral clustering method is useful in text summarization because it does not assume any fixed shape of the clusters, and the number of clusters can automatically be inferred using the Eigen gap method. In our approach, we have used word embedding-based sentence representation and a spectral clustering algorithm to identify various topics covered in a Bengali document and generate an extractive summary by selecting salient sentences from the identified topics. We have compared our developed Bengali summarization system with several baseline extractive summarization systems. The experimental results show that the proposed approach performs better than some baseline Bengali summarization systems it is compared to.
Anthology ID:
2022.icon-main.40
Volume:
Proceedings of the 19th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2022
Address:
New Delhi, India
Editors:
Md. Shad Akhtar, Tanmoy Chakraborty
Venue:
ICON
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
337–346
Language:
URL:
https://aclanthology.org/2022.icon-main.40
DOI:
Bibkey:
Cite (ACL):
Sohini Roychowdhury, Kamal Sarkar, and Arka Maji. 2022. Unsupervised Bengali Text Summarization Using Sentence Embedding and Spectral Clustering. In Proceedings of the 19th International Conference on Natural Language Processing (ICON), pages 337–346, New Delhi, India. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Bengali Text Summarization Using Sentence Embedding and Spectral Clustering (Roychowdhury et al., ICON 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.icon-main.40.pdf