DartmouthCS at SemEval-2022 Task 8: Predicting Multilingual News Article Similarity with Meta-Information and Translation

Joseph Hajjar, Weicheng Ma, Soroush Vosoughi


Abstract
This paper presents our approach for tackling SemEval-2022 Task 8: Multilingual News Article Similarity. Our experiments show that even by using multi-lingual pre-trained language models (LMs), translating the text into the same language yields the best evaluation performance. We also find that stylometric features of the text and meta-information of the news articles can be predicted based on the text with low error rates, and these predictions could be used to improve the predictions of the overall similarity scores. These findings suggest substantial correlations between authorship information and topical similarity estimation, which sheds light on future stylometric and topic modeling research.
Anthology ID:
2022.semeval-1.163
Volume:
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Guy Emerson, Natalie Schluter, Gabriel Stanovsky, Ritesh Kumar, Alexis Palmer, Nathan Schneider, Siddharth Singh, Shyam Ratan
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1157–1162
Language:
URL:
https://aclanthology.org/2022.semeval-1.163
DOI:
10.18653/v1/2022.semeval-1.163
Bibkey:
Cite (ACL):
Joseph Hajjar, Weicheng Ma, and Soroush Vosoughi. 2022. DartmouthCS at SemEval-2022 Task 8: Predicting Multilingual News Article Similarity with Meta-Information and Translation. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 1157–1162, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
DartmouthCS at SemEval-2022 Task 8: Predicting Multilingual News Article Similarity with Meta-Information and Translation (Hajjar et al., SemEval 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.semeval-1.163.pdf