Improving Multi-Label Classification of Similar Languages by Semantics-Aware Word Embeddings

The Ngo, Thi Anh Nguyen, My Ha, Thi Minh Nguyen, Phuong Le-Hong


Abstract
The VLP team participated in the DSL-ML shared task of the VarDial 2024 workshop which aims to distinguish texts in similar languages. This paper presents our approach to solving the problem and discusses our experimental and official results. We propose to integrate semantics-aware word embeddings which are learned from ConceptNet into a bidirectional long short-term memory network. This approach achieves good performance – our sys- tem is ranked in the top two or three of the best performing teams for the task.
Anthology ID:
2024.vardial-1.21
Volume:
Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Marcos Zampieri, Preslav Nakov, Jörg Tiedemann
Venues:
VarDial | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
235–240
Language:
URL:
https://aclanthology.org/2024.vardial-1.21
DOI:
10.18653/v1/2024.vardial-1.21
Bibkey:
Cite (ACL):
The Ngo, Thi Anh Nguyen, My Ha, Thi Minh Nguyen, and Phuong Le-Hong. 2024. Improving Multi-Label Classification of Similar Languages by Semantics-Aware Word Embeddings. In Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024), pages 235–240, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Improving Multi-Label Classification of Similar Languages by Semantics-Aware Word Embeddings (Ngo et al., VarDial-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.vardial-1.21.pdf
Supplementary material:
 2024.vardial-1.21.SupplementaryMaterial.txt