Code-switched inspired losses for spoken dialog representations

Pierre Colombo, Emile Chapuis, Matthieu Labeau, Chloé Clavel


Abstract
Spoken dialogue systems need to be able to handle both multiple languages and multilinguality inside a conversation (e.g in case of code-switching). In this work, we introduce new pretraining losses tailored to learn generic multilingual spoken dialogue representations. The goal of these losses is to expose the model to code-switched language. In order to scale up training, we automatically build a pretraining corpus composed of multilingual conversations in five different languages (French, Italian, English, German and Spanish) from OpenSubtitles, a huge multilingual corpus composed of 24.3G tokens. We test the generic representations on MIAM, a new benchmark composed of five dialogue act corpora on the same aforementioned languages as well as on two novel multilingual tasks (i.e multilingual mask utterance retrieval and multilingual inconsistency identification). Our experiments show that our new losses achieve a better performance in both monolingual and multilingual settings.
Anthology ID:
2021.emnlp-main.656
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8320–8337
Language:
URL:
https://aclanthology.org/2021.emnlp-main.656
DOI:
10.18653/v1/2021.emnlp-main.656
Bibkey:
Cite (ACL):
Pierre Colombo, Emile Chapuis, Matthieu Labeau, and Chloé Clavel. 2021. Code-switched inspired losses for spoken dialog representations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8320–8337, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Code-switched inspired losses for spoken dialog representations (Colombo et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.656.pdf
Data
OpenSubtitles