Isabelle G. Lee


pdf bib
GupShup: Summarizing Open-Domain Code-Switched Conversations
Laiba Mehnaz | Debanjan Mahata | Rakesh Gosangi | Uma Sushmitha Gunturi | Riya Jain | Gauri Gupta | Amardeep Kumar | Isabelle G. Lee | Anish Acharya | Rajiv Ratn Shah
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Code-switching is the communication phenomenon where the speakers switch between different languages during a conversation. With the widespread adoption of conversational agents and chat platforms, code-switching has become an integral part of written conversations in many multi-lingual communities worldwide. Therefore, it is essential to develop techniques for understanding and summarizing these conversations. Towards this objective, we introduce the task of abstractive summarization of Hindi-English (Hi-En) code-switched conversations. We also develop the first code-switched conversation summarization dataset - GupShup, which contains over 6,800 Hi-En conversations and their corresponding human-annotated summaries in English (En) and Hi-En. We present a detailed account of the entire data collection and annotation process. We analyze the dataset using various code-switching statistics. We train state-of-the-art abstractive summarization models and report their performances using both automated metrics and human evaluation. Our results show that multi-lingual mBART and multi-view seq2seq models obtain the best performances on this new dataset. We also conduct an extensive qualitative analysis to provide insight into the models and some of their shortcomings.