Understanding Linguistic Accommodation in Code-Switched Human-Machine Dialogues

Tanmay Parekh, Emily Ahn, Yulia Tsvetkov, Alan W Black


Abstract
Code-switching is a ubiquitous phenomenon in multilingual communities. Natural language technologies that wish to communicate like humans must therefore adaptively incorporate code-switching techniques when they are deployed in multilingual settings. To this end, we propose a Hindi-English human-machine dialogue system that elicits code-switching conversations in a controlled setting. It uses different code-switching agent strategies to understand how users respond and accommodate to the agent’s language choice. Through this system, we collect and release a new dataset CommonDost, comprising of 439 human-machine multilingual conversations. We adapt pre-defined metrics to discover linguistic accommodation from users to agents. Finally, we compare these dialogues with Spanish-English dialogues collected in a similar setting, and analyze the impact of linguistic and socio-cultural factors on code-switching patterns across the two language pairs.
Anthology ID:
2020.conll-1.46
Volume:
Proceedings of the 24th Conference on Computational Natural Language Learning
Month:
November
Year:
2020
Address:
Online
Editors:
Raquel Fernández, Tal Linzen
Venue:
CoNLL
SIG:
SIGNLL
Publisher:
Association for Computational Linguistics
Note:
Pages:
565–577
Language:
URL:
https://aclanthology.org/2020.conll-1.46
DOI:
10.18653/v1/2020.conll-1.46
Bibkey:
Cite (ACL):
Tanmay Parekh, Emily Ahn, Yulia Tsvetkov, and Alan W Black. 2020. Understanding Linguistic Accommodation in Code-Switched Human-Machine Dialogues. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 565–577, Online. Association for Computational Linguistics.
Cite (Informal):
Understanding Linguistic Accommodation in Code-Switched Human-Machine Dialogues (Parekh et al., CoNLL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.conll-1.46.pdf
Code
 tanmayparekh/commondost