Abstractive Hindi Text Summarization: A Challenge in a Low-Resource Setting

Lal Daisy, Rayson Paul, Singh Krishna, Tiwary Uma Shanker


Abstract
The Internet has led to a surge in text data in Indian languages; hence, text summarization tools have become essential for information retrieval. Due to a lack of data resources, prevailing summarizing systems in Indian languages have been primarily dependent on and derived from English text summarization approaches. Despite Hindi being the most widely spoken language in India, progress in Hindi summarization is being delayed due to the lack of proper labeled datasets. In this preliminary work we address two major challenges in abstractive Hindi text summarization: creating Hindi language summaries and assessing the efficacy of the produced summaries. Since transfer learning (TL) has shown to be effective in low-resource settings, in order to assess the effectiveness of TL-based approach for summarizing Hindi text, we perform a comparative analysis using three encoder-decoder models: attention-based (BASE), multi-level (MED), and TL-based model (RETRAIN). In relation to the second challenge, we introduce the ICE-H evaluation metric based on the ICE metric for assessing English language summaries. The Rouge and ICE-H metrics are used for evaluating the BASE, MED, and RETRAIN models. According to the Rouge results, the RETRAIN model produces slightly better abstracts than the BASE and MED models for 20k and 100k training samples. The ICE-H metric, on the other hand, produces inconclusive results, which may be attributed to the limitations of existing Hindi NLP resources, such as word embeddings and POS taggers.
Anthology ID:
2023.icon-1.58
Volume:
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2023
Address:
Goa University, Goa, India
Editors:
D. Pawar Jyoti, Lalitha Devi Sobha
Venue:
ICON
SIG:
SIGLEX
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
603–612
Language:
URL:
https://aclanthology.org/2023.icon-1.58
DOI:
Bibkey:
Cite (ACL):
Lal Daisy, Rayson Paul, Singh Krishna, and Tiwary Uma Shanker. 2023. Abstractive Hindi Text Summarization: A Challenge in a Low-Resource Setting. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), pages 603–612, Goa University, Goa, India. NLP Association of India (NLPAI).
Cite (Informal):
Abstractive Hindi Text Summarization: A Challenge in a Low-Resource Setting (Daisy et al., ICON 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.icon-1.58.pdf