Automatic Myanmar Image Captioning using CNN and LSTM-Based Language Model

San Pa Pa Aung; Win Pa Pa; Tin Lay Nwe

Automatic Myanmar Image Captioning using CNN and LSTM-Based Language Model

Abstract

An image captioning system involves modules on computer vision as well as natural language processing. Computer vision module is for detecting salient objects or extracting features of images and Natural Language Processing (NLP) module is for generating correct syntactic and semantic image captions. Although many image caption datasets such as Flickr8k, Flickr30k and MSCOCO are publicly available, most of the datasets are captioned in English language. There is no image caption corpus for Myanmar language. Myanmar image caption corpus is manually built as part of the Flickr8k dataset in this current work. Furthermore, a generative merge model based on Convolutional Neural Network (CNN) and Long-Short Term Memory (LSTM) is applied especially for Myanmar image captioning. Next, two conventional feature extraction models Visual Geometry Group (VGG) OxfordNet 16-layer and 19-layer are compared. The performance of this system is evaluated on Myanmar image caption corpus using BLEU scores and 10-fold cross validation.

Anthology ID:: 2020.sltu-1.19
Volume:: Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Dorothee Beermann, Laurent Besacier, Sakriani Sakti, Claudia Soria
Venue:: SLTU
SIG:
Publisher:: European Language Resources association
Note:
Pages:: 139–143
Language:: English
URL:: https://aclanthology.org/2020.sltu-1.19/
DOI:
Bibkey:
Cite (ACL):: San Pa Pa Aung, Win Pa Pa, and Tin Lay Nwe. 2020. Automatic Myanmar Image Captioning using CNN and LSTM-Based Language Model. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pages 139–143, Marseille, France. European Language Resources association.
Cite (Informal):: Automatic Myanmar Image Captioning using CNN and LSTM-Based Language Model (Pa Pa Aung et al., SLTU 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.sltu-1.19.pdf

PDF Cite Search Fix data