Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise Multilingual News Article Similarity

Shotaro Ishihara; Hono Shirai

doi:10.18653/v1/2022.semeval-1.171

Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise Multilingual News Article Similarity

Abstract

This paper describes our system in SemEval-2022 Task 8, where participants were required to predict the similarity of two multilingual news articles. In the task of pairwise sentence and document scoring, there are two main approaches: Cross-Encoder, which inputs pairs of texts into a single encoder, and Bi-Encoder, which encodes each input independently. The former method often achieves higher performance, but the latter gave us a better result in SemEval-2022 Task 8. This paper presents our exploration of BERT-based Bi-Encoder approach for this task, and there are several findings such as pretrained models, pooling methods, translation, data separation, and the number of tokens. The weighted average ensemble of the four models achieved the competitive result and ranked in the top 12.

Anthology ID:: 2022.semeval-1.171
Volume:: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Month:: July
Year:: 2022
Address:: Seattle, United States
Editors:: Guy Emerson, Natalie Schluter, Gabriel Stanovsky, Ritesh Kumar, Alexis Palmer, Nathan Schneider, Siddharth Singh, Shyam Ratan
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1208–1214
Language:
URL:: https://aclanthology.org/2022.semeval-1.171/
DOI:: 10.18653/v1/2022.semeval-1.171
Bibkey:
Cite (ACL):: Shotaro Ishihara and Hono Shirai. 2022. Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise Multilingual News Article Similarity. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 1208–1214, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):: Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise Multilingual News Article Similarity (Ishihara & Shirai, SemEval 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.semeval-1.171.pdf
Video:: https://aclanthology.org/2022.semeval-1.171.mp4

PDF Cite Search Video Fix data