Generative Models For Indic Languages: Evaluating Content Generation Capabilities

Savita Bhat, Vasudeva Varma, Niranjan Pedanekar


Abstract
Large language models (LLMs) and generative AI have emerged as the most important areas in the field of natural language processing (NLP). LLMs are considered to be a key component in several NLP tasks, such as summarization, question-answering, sentiment classification, and translation. Newer LLMs, such as ChatGPT, BLOOMZ, and several such variants, are known to train on multilingual training data and hence are expected to process and generate text in multiple languages. Considering the widespread use of LLMs, evaluating their efficacy in multilingual settings is imperative. In this work, we evaluate the newest generative models (ChatGPT, mT0, and BLOOMZ) in the context of Indic languages. Specifically, we consider natural language generation (NLG) applications such as summarization and question-answering in monolingual and cross-lingual settings. We observe that current generative models have limited capability for generating text in Indic languages in a zero-shot setting. In contrast, generative models perform consistently better on manual quality-based evaluation in both Indic languages and English language generation. Considering limited generation performance, we argue that these LLMs are not intended to use in zero-shot fashion in downstream applications.
Anthology ID:
2023.ranlp-1.21
Volume:
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
187–195
Language:
URL:
https://aclanthology.org/2023.ranlp-1.21
DOI:
Bibkey:
Cite (ACL):
Savita Bhat, Vasudeva Varma, and Niranjan Pedanekar. 2023. Generative Models For Indic Languages: Evaluating Content Generation Capabilities. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 187–195, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Generative Models For Indic Languages: Evaluating Content Generation Capabilities (Bhat et al., RANLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ranlp-1.21.pdf