Karthikraja Tp
2024
NLP_Team1@SSN at SemEval-2024 Task 1: Impact of language models in Sentence-BERT for Semantic Textual Relatedness in Low-resource Languages
Senthil Kumar
|
Aravindan Chandrabose
|
Gokulakrishnan B
|
Karthikraja Tp
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Semantic Textual Relatedness (STR) will provide insight into the limitations of existing models and support ongoing work on semantic representations. Track A in Shared Task-1, provides pairs of sentences with semantic relatedness scores for 9 languages out of which 7 are low-resources. These languages are from four different language families. We developed models for 8 languages (except for Amharic) in Track A, using Sentence Transformers (SBERT) architecture, and fine-tuned them with multilingual and monolingual pre-trained language models (PLM). Our models for English (eng), Algerian Arabic (arq), andKinyarwanda (kin) languages were ranked 12, 5, and 8 respectively. Our submissions are ranked 5th among 40 submissions in Track A with an average Spearman correlation score of 0.74. However, we observed that the usage of monolingual PLMs did not guarantee better than multilingual PLMs in Marathi (mar), and Telugu (tel) languages in our case.