Supervised and Unsupervised Evaluation of Synthetic Code-Switching

Evgeny Orlov, Ekaterina Artemova


Abstract
Code-switching (CS) is a phenomenon of mixing words and phrases from multiple languages within a single sentence or conversation. The ever-growing amount of CS communication among multilingual speakers in social media has highlighted the need to adapt existing NLP products for CS speakers and lead to a rising interest in solving CS NLP tasks. A large number of contemporary approaches use synthetic CS data for training. As previous work has shown the positive effect of pretraining on high-quality CS data, the task of evaluating synthetic CS becomes crucial. In this paper, we address the task of evaluating synthetic CS in two settings. In supervised setting, we apply Hinglish finetuned models to solve the quality rating prediction task of HinglishEval competition and establish a new SOTA. In unsupervised setting, we employ the method of acceptability measures with the same models. We find that in both settings, models finetuned on CS data consistently outperform their original counterparts.
Anthology ID:
2022.wnut-1.13
Volume:
Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022)
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
113–123
Language:
URL:
https://aclanthology.org/2022.wnut-1.13
DOI:
Bibkey:
Cite (ACL):
Evgeny Orlov and Ekaterina Artemova. 2022. Supervised and Unsupervised Evaluation of Synthetic Code-Switching. In Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022), pages 113–123, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
Supervised and Unsupervised Evaluation of Synthetic Code-Switching (Orlov & Artemova, WNUT 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.wnut-1.13.pdf
Data
CoLA