Detecting Entailment in Code-Mixed Hindi-English Conversations

Sharanya Chakravarthy, Anjana Umapathy, Alan W Black


Abstract
The presence of large-scale corpora for Natural Language Inference (NLI) has spurred deep learning research in this area, though much of this research has focused solely on monolingual data. Code-mixing is the intertwined usage of multiple languages, and is commonly seen in informal conversations among polyglots. Given the rising importance of dialogue agents, it is imperative that they understand code-mixing, but the scarcity of code-mixed Natural Language Understanding (NLU) datasets has precluded research in this area. The dataset by Khanuja et. al. for detecting conversational entailment in code-mixed Hindi-English text is the first of its kind. We investigate the effectiveness of language modeling, data augmentation, translation, and architectural approaches to address the code-mixed, conversational, and low-resource aspects of this dataset. We obtain an 8.09% increase in test set accuracy over the current state of the art.
Anthology ID:
2020.wnut-1.22
Volume:
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Month:
November
Year:
2020
Address:
Online
Editors:
Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
Venue:
WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
165–170
Language:
URL:
https://aclanthology.org/2020.wnut-1.22
DOI:
10.18653/v1/2020.wnut-1.22
Bibkey:
Cite (ACL):
Sharanya Chakravarthy, Anjana Umapathy, and Alan W Black. 2020. Detecting Entailment in Code-Mixed Hindi-English Conversations. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pages 165–170, Online. Association for Computational Linguistics.
Cite (Informal):
Detecting Entailment in Code-Mixed Hindi-English Conversations (Chakravarthy et al., WNUT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wnut-1.22.pdf
Code
 sharanyarc96/hinglishnli
Data
MultiNLISNLIXNLI