Intrinsic evaluation of language models for code-switching

Sik Feng Cheong, Hai Leong Chieu, Jing Lim


Abstract
Language models used in speech recognition are often either evaluated intrinsically using perplexity on test data, or extrinsically with an automatic speech recognition (ASR) system. The former evaluation does not always correlate well with ASR performance, while the latter could be specific to particular ASR systems. Recent work proposed to evaluate language models by using them to classify ground truth sentences among alternative phonetically similar sentences generated by a fine state transducer. Underlying such an evaluation is the assumption that the generated sentences are linguistically incorrect. In this paper, we first put this assumption into question, and observe that alternatively generated sentences could often be linguistically correct when they differ from the ground truth by only one edit. Secondly, we showed that by using multi-lingual BERT, we can achieve better performance than previous work on two code-switching data sets. Our implementation is publicly available on Github at https://github.com/sikfeng/language-modelling-for-code-switching.
Anthology ID:
2021.wnut-1.10
Volume:
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)
Month:
November
Year:
2021
Address:
Online
Editors:
Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
Venue:
WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
81–86
Language:
URL:
https://aclanthology.org/2021.wnut-1.10
DOI:
10.18653/v1/2021.wnut-1.10
Bibkey:
Cite (ACL):
Sik Feng Cheong, Hai Leong Chieu, and Jing Lim. 2021. Intrinsic evaluation of language models for code-switching. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021), pages 81–86, Online. Association for Computational Linguistics.
Cite (Informal):
Intrinsic evaluation of language models for code-switching (Cheong et al., WNUT 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wnut-1.10.pdf
Code
 sikfeng/language-modelling-for-code-switching