Phonetic, Semantic, and Articulatory Features in Assamese-Bengali Cognate Detection

Abhijnan Nath, Rahul Ghosh, Nikhil Krishnaswamy


Abstract
In this paper, we propose a method to detect if words in two similar languages, Assamese and Bengali, are cognates. We mix phonetic, semantic, and articulatory features and use the cognate detection task to analyze the relative informational contribution of each type of feature to distinguish words in the two similar languages. In addition, since support for low-resourced languages like Assamese can be weak or nonexistent in some multilingual language models, we create a monolingual Assamese Transformer model and explore augmenting multilingual models with monolingual models using affine transformation techniques between vector spaces.
Anthology ID:
2022.vardial-1.5
Volume:
Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Preslav Nakov, Jörg Tiedemann, Marcos Zampieri
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
41–53
Language:
URL:
https://aclanthology.org/2022.vardial-1.5
DOI:
Bibkey:
Cite (ACL):
Abhijnan Nath, Rahul Ghosh, and Nikhil Krishnaswamy. 2022. Phonetic, Semantic, and Articulatory Features in Assamese-Bengali Cognate Detection. In Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 41–53, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
Phonetic, Semantic, and Articulatory Features in Assamese-Bengali Cognate Detection (Nath et al., VarDial 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.vardial-1.5.pdf
Optional supplementary material:
 2022.vardial-1.5.OptionalSupplementaryMaterial.zip