Towards Code-Mixed Hinglish Dialogue Generation

Vibhav Agarwal, Pooja Rao, Dinesh Babu Jayagopi


Abstract
Code-mixed language plays a crucial role in communication in multilingual societies. Though the recent growth of web users has greatly boosted the use of such mixed languages, the current generation of dialog systems is primarily monolingual. This increase in usage of code-mixed language has prompted dialog systems in a similar language. We present our work in Code-Mixed Dialog Generation, an unexplored task in code-mixed languages, generating utterances in code-mixed language rather than a single language that is more often just English. We present a new synthetic corpus in code-mix for dialogs, CM-DailyDialog, by converting an existing English-only dialog corpus to a mixed Hindi-English corpus. We then propose a baseline approach where we show the effectiveness of using mBART like multilingual sequence-to-sequence transformers for code-mixed dialog generation. Our best performing dialog models can conduct coherent conversations in Hindi-English mixed language as evaluated by human and automatic metrics setting new benchmarks for the Code-Mixed Dialog Generation task.
Anthology ID:
2021.nlp4convai-1.26
Volume:
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI
Month:
November
Year:
2021
Address:
Online
Venues:
EMNLP | NLP4ConvAI
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
271–280
Language:
URL:
https://aclanthology.org/2021.nlp4convai-1.26
DOI:
10.18653/v1/2021.nlp4convai-1.26
Bibkey:
Cite (ACL):
Vibhav Agarwal, Pooja Rao, and Dinesh Babu Jayagopi. 2021. Towards Code-Mixed Hinglish Dialogue Generation. In Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI, pages 271–280, Online. Association for Computational Linguistics.
Cite (Informal):
Towards Code-Mixed Hinglish Dialogue Generation (Agarwal et al., NLP4ConvAI 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.nlp4convai-1.26.pdf