Towards Code-Mixed Hinglish Dialogue Generation

Vibhav Agarwal, Pooja Rao, Dinesh Babu Jayagopi


Abstract
Code-mixed language plays a crucial role in communication in multilingual societies. Though the recent growth of web users has greatly boosted the use of such mixed languages, the current generation of dialog systems is primarily monolingual. This increase in usage of code-mixed language has prompted dialog systems in a similar language. We present our work in Code-Mixed Dialog Generation, an unexplored task in code-mixed languages, generating utterances in code-mixed language rather than a single language that is more often just English. We present a new synthetic corpus in code-mix for dialogs, CM-DailyDialog, by converting an existing English-only dialog corpus to a mixed Hindi-English corpus. We then propose a baseline approach where we show the effectiveness of using mBART like multilingual sequence-to-sequence transformers for code-mixed dialog generation. Our best performing dialog models can conduct coherent conversations in Hindi-English mixed language as evaluated by human and automatic metrics setting new benchmarks for the Code-Mixed Dialog Generation task.
Anthology ID:
2021.ranlp-srw.2
Volume:
Proceedings of the Student Research Workshop Associated with RANLP 2021
Month:
September
Year:
2021
Address:
Online
Editors:
Souhila Djabri, Dinara Gimadi, Tsvetomila Mihaylova, Ivelina Nikolova-Koleva
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
7–15
Language:
URL:
https://aclanthology.org/2021.ranlp-srw.2
DOI:
Bibkey:
Cite (ACL):
Vibhav Agarwal, Pooja Rao, and Dinesh Babu Jayagopi. 2021. Towards Code-Mixed Hinglish Dialogue Generation. In Proceedings of the Student Research Workshop Associated with RANLP 2021, pages 7–15, Online. INCOMA Ltd..
Cite (Informal):
Towards Code-Mixed Hinglish Dialogue Generation (Agarwal et al., RANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ranlp-srw.2.pdf
Data
DailyDialogLinCEPHINC