MAIR: A Massive Benchmark for Evaluating Instructed Retrieval

Weiwei Sun; Zhengliang Shi; Wu Jiu Long; Lingyong Yan; Xinyu Ma; Yiding Liu; Min Cao; Dawei Yin; Zhaochun Ren

doi:10.18653/v1/2024.emnlp-main.778

MAIR: A Massive Benchmark for Evaluating Instructed Retrieval

Weiwei Sun, Zhengliang Shi, Wu Jiu Long, Lingyong Yan, Xinyu Ma, Yiding Liu, Min Cao, Dawei Yin, Zhaochun Ren

Abstract

Recent information retrieval (IR) models are pre-trained and instruction-tuned on massive datasets and tasks, enabling them to perform well on a wide range of tasks and potentially generalize to unseen tasks with instructions. However, existing IR benchmarks focus on a limited scope of tasks, making them insufficient for evaluating the latest IR models. In this paper, we propose MAIR (Massive Instructed Retrieval Benchmark), a heterogeneous IR benchmark that includes 126 distinct IR tasks across 6 domains, collected from existing datasets. We benchmark state-of-the-art instruction-tuned text embedding models and re-ranking models. Our experiments reveal that instruction-tuned models generally achieve superior performance compared to non-instruction-tuned models on MAIR Additionally, our results suggest that current instruction-tuned text embedding models and re-ranking models still lack effectiveness in specific long-tail tasks. MAIR is publicly available at https://github.com/sunnweiwei/Mair.

Anthology ID:: 2024.emnlp-main.778
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14044–14067
Language:
URL:: https://aclanthology.org/2024.emnlp-main.778/
DOI:: 10.18653/v1/2024.emnlp-main.778
Bibkey:
Cite (ACL):: Weiwei Sun, Zhengliang Shi, Wu Jiu Long, Lingyong Yan, Xinyu Ma, Yiding Liu, Min Cao, Dawei Yin, and Zhaochun Ren. 2024. MAIR: A Massive Benchmark for Evaluating Instructed Retrieval. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 14044–14067, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: MAIR: A Massive Benchmark for Evaluating Instructed Retrieval (Sun et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.778.pdf

PDF Cite Search Fix data