hub at SemEval-2021 Task 2: Word Meaning Similarity Prediction Model Based on RoBERTa and Word Frequency

Bo Huang, Yang Bai, Xiaobing Zhou


Abstract
This paper introduces the system description of the hub team, which explains the related work and experimental results of our team’s participation in SemEval 2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC). The data of this shared task is mainly some cross-language or multi-language sentence pair corpus. The languages covered in the corpus include English, Chinese, French, Russian, and Arabic. The task goal is to judge whether the same words in these sentence pairs have the same meaning in the sentence. This can be seen as a task of binary classification of sentence pairs. What we need to do is to use our method to determine as accurately as possible the meaning of the words in a sentence pair are the same or different. The model used by our team is mainly composed of RoBERTa and Tf-Idf algorithms. The result evaluation index of task submission is the F1 score. We only participated in the English language task. The final score of the test set prediction results submitted by our team was 84.60.
Anthology ID:
2021.semeval-1.94
Volume:
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Month:
August
Year:
2021
Address:
Online
Venues:
ACL | IJCNLP | SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
719–723
Language:
URL:
https://aclanthology.org/2021.semeval-1.94
DOI:
10.18653/v1/2021.semeval-1.94
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.semeval-1.94.pdf