A Self-supervised Joint Training Framework for Document Reranking

Xiaozhi Zhu, Tianyong Hao, Sijie Cheng, Fu Lee Wang, Hai Liu


Abstract
Pretrained language models such as BERT have been successfully applied to a wide range of natural language processing tasks and also achieved impressive performance in document reranking tasks. Recent works indicate that further pretraining the language models on the task-specific datasets before fine-tuning helps improve reranking performance. However, the pre-training tasks like masked language model and next sentence prediction were based on the context of documents instead of encouraging the model to understand the content of queries in document reranking task. In this paper, we propose a new self-supervised joint training framework (SJTF) with a self-supervised method called Masked Query Prediction (MQP) to establish semantic relations between given queries and positive documents. The framework randomly masks a token of query and encodes the masked query paired with positive documents, and uses a linear layer as a decoder to predict the masked token. In addition, the MQP is used to jointly optimize the models with supervised ranking objective during fine-tuning stage without an extra further pre-training stage. Extensive experiments on the MS MARCO passage ranking and TREC Robust datasets show that models trained with our framework obtain significant improvements compared to original models.
Anthology ID:
2022.findings-naacl.79
Volume:
Findings of the Association for Computational Linguistics: NAACL 2022
Month:
July
Year:
2022
Address:
Seattle, United States
Venues:
Findings | NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1056–1065
Language:
URL:
https://aclanthology.org/2022.findings-naacl.79
DOI:
10.18653/v1/2022.findings-naacl.79
Bibkey:
Cite (ACL):
Xiaozhi Zhu, Tianyong Hao, Sijie Cheng, Fu Lee Wang, and Hai Liu. 2022. A Self-supervised Joint Training Framework for Document Reranking. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1056–1065, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
A Self-supervised Joint Training Framework for Document Reranking (Zhu et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-naacl.79.pdf