MaskLID: Code-Switching Language Identification through Iterative Masking

Amir Hossein Kargaran, François Yvon, Hinrich Schuetze


Abstract
We present MaskLID, a simple, yet effective, code-switching (CS) language identification (LID) method. MaskLID does not require any training and is designed to complement current high-performance sentence-level LIDs. Sentence-level LIDs are classifiers trained on monolingual texts to provide single labels, typically using a softmax layer to turn scores into probabilities. However, in cases where a sentence is composed in both L1 and L2 languages, the LID classifier often only returns the dominant label L1. To address this limitation, MaskLID employs a strategy to mask text features associated with L1, allowing the LID to classify the text as L2 in the next round. This method uses the LID itself to identify the features that require masking and does not rely on any external resource. In this work, we explore the use of MaskLID for two open-source LIDs (GlotLID and OpenLID), that are both based on the FastText architecture. Code and demo are available at https://github.com/cisnlp/MaskLID.
Anthology ID:
2024.acl-short.43
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
459–469
Language:
URL:
https://aclanthology.org/2024.acl-short.43
DOI:
Bibkey:
Cite (ACL):
Amir Hossein Kargaran, François Yvon, and Hinrich Schuetze. 2024. MaskLID: Code-Switching Language Identification through Iterative Masking. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 459–469, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
MaskLID: Code-Switching Language Identification through Iterative Masking (Kargaran et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-short.43.pdf