Relevance-aware Diverse Query Generation for Out-of-domain Text Ranking

Jia-Huei Ju, Chao-Han Yang, Szu-Wei Fu, Ming-Feng Tsai, Chuan-Ju Wang


Abstract
Domain adaptation presents significant challenges for out-of-domain text ranking, especially when supervised data is limited. In this paper, we present ReadQG (Relevance-Aware Diverse Query Generation), a method to generate informative synthetic queries to facilitate the adaptation process of text ranking models. Unlike previous approaches focusing solely on relevant query generation, our ReadQG generates diverse queries with continuous relevance scores. Specifically, we propose leveraging soft-prompt tuning and diverse generation objectives to control query generation according to the given relevance. Our experiments show that integrating negative queries into the learning process enhances the effectiveness of text ranking models in out-of-domain information retrieval (IR) benchmarks. Furthermore, we measure the quality of query generation, highlighting the underlying beneficial characteristics of negative queries. Our empirical results and analysis also shed light on potential directions for more advanced data augmentation in IR. The data and code have been released.
Anthology ID:
2024.repl4nlp-1.3
Volume:
Proceedings of the 9th Workshop on Representation Learning for NLP (RepL4NLP-2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Chen Zhao, Marius Mosbach, Pepa Atanasova, Seraphina Goldfarb-Tarrent, Peter Hase, Arian Hosseini, Maha Elbayad, Sandro Pezzelle, Maximilian Mozes
Venues:
RepL4NLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26–36
Language:
URL:
https://aclanthology.org/2024.repl4nlp-1.3
DOI:
Bibkey:
Cite (ACL):
Jia-Huei Ju, Chao-Han Yang, Szu-Wei Fu, Ming-Feng Tsai, and Chuan-Ju Wang. 2024. Relevance-aware Diverse Query Generation for Out-of-domain Text Ranking. In Proceedings of the 9th Workshop on Representation Learning for NLP (RepL4NLP-2024), pages 26–36, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Relevance-aware Diverse Query Generation for Out-of-domain Text Ranking (Ju et al., RepL4NLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.repl4nlp-1.3.pdf