A Scaled Encoder Decoder Network for Image Captioning in Hindi

Santosh Kumar Mishra; Sriparna Saha; Pushpak Bhattacharyya

A Scaled Encoder Decoder Network for Image Captioning in Hindi

Santosh Kumar Mishra, Sriparna Saha, Pushpak Bhattacharyya

Abstract

Image captioning is a prominent research area in computer vision and natural language processing, which automatically generates natural language descriptions for images. Most of the existing works have focused on developing models for image captioning in the English language. The current paper introduces a novel deep learning architecture based on encoder-decoder with an attention mechanism for image captioning in the Hindi language. For encoder, decoder, and attention, several deep learning-based architectures have been explored. Hindi, the fourth-most spoken language globally, is widely spoken in India and South Asia and is one of India’s official languages. The proposed encoder-decoder architecture utilizes scaling in convolution neural networks to achieve better accuracy than state-of-the-art image captioning methods in Hindi. The proposed method’s performance is compared with state-of-the-art methods in terms of BLEU scores and manual evaluation (in terms of adequacy and fluency). The obtained results demonstrate the efficacy of the proposed method.

Anthology ID:: 2021.icon-main.30
Volume:: Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Month:: December
Year:: 2021
Address:: National Institute of Technology Silchar, Silchar, India
Editors:: Sivaji Bandyopadhyay, Sobha Lalitha Devi, Pushpak Bhattacharyya
Venue:: ICON
SIG:
Publisher:: NLP Association of India (NLPAI)
Note:
Pages:: 251–260
Language:
URL:: https://aclanthology.org/2021.icon-main.30/
DOI:
Bibkey:
Cite (ACL):: Santosh Kumar Mishra, Sriparna Saha, and Pushpak Bhattacharyya. 2021. A Scaled Encoder Decoder Network for Image Captioning in Hindi. In Proceedings of the 18th International Conference on Natural Language Processing (ICON), pages 251–260, National Institute of Technology Silchar, Silchar, India. NLP Association of India (NLPAI).
Cite (Informal):: A Scaled Encoder Decoder Network for Image Captioning in Hindi (Mishra et al., ICON 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.icon-main.30.pdf

PDF Cite Search Fix data