HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with Data Augmentation for Multilingual News Similarity

Zihang Xu, Ziqing Yang, Yiming Cui, Zhigang Chen


Abstract
This paper describes our system designed for SemEval-2022 Task 8: Multilingual News Article Similarity. We proposed a linguistics-inspired model trained with a few task-specific strategies. The main techniques of our system are: 1) data augmentation, 2) multi-label loss, 3) adapted R-Drop, 4) samples reconstruction with the head-tail combination. We also present a brief analysis of some negative methods like two-tower architecture. Our system ranked 1st on the leaderboard while achieving a Pearson’s Correlation Coefficient of 0.818 on the official evaluation set.
Anthology ID:
2022.semeval-1.157
Volume:
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Month:
July
Year:
2022
Address:
Seattle, United States
Venues:
NAACL | SemEval
SIGs:
SIGLEX | SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
1114–1120
Language:
URL:
https://aclanthology.org/2022.semeval-1.157
DOI:
10.18653/v1/2022.semeval-1.157
Bibkey:
Cite (ACL):
Zihang Xu, Ziqing Yang, Yiming Cui, and Zhigang Chen. 2022. HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with Data Augmentation for Multilingual News Similarity. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 1114–1120, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with Data Augmentation for Multilingual News Similarity (Xu et al., SemEval 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.semeval-1.157.pdf
Code
 geekdream-x/semeval2022-task8-tonyx