Yingqi Qu


2024

pdf bib
BASES: Large-scale Web Search User Simulation with Large Language Model based Agents
Ruiyang Ren | Peng Qiu | Yingqi Qu | Jing Liu | Xin Zhao | Hua Wu | Ji-Rong Wen | Haifeng Wang
Findings of the Association for Computational Linguistics: EMNLP 2024

Due to the excellent capacities of large language models (LLMs), it becomes feasible to develop LLM-based agents for reliable user simulation. Considering the scarcity and limit (e.g., privacy issues) of real user data, in this paper, we conduct large-scale user simulations for the web search scenario to improve the analysis and modeling of user search behavior. Specially, we propose BASES, a novel user simulation framework with LLM-based agents, designed to facilitate comprehensive simulations of web search user behaviors. Our simulation framework can generate unique user profiles at scale, which subsequently leads to diverse search behaviors. To demonstrate the effectiveness of BASES, we conduct evaluation experiments based on two human benchmarks in both Chinese and English, demonstrating that BASES can effectively simulate large-scale human-like search behaviors. To further accommodate the research on web search, we develop WARRIORS, a new large-scale dataset encompassing web search user behaviors, including both Chinese and English versions, which can greatly bolster research in the field of information retrieval.

pdf bib
Self-Evaluation of Large Language Model based on Glass-box Features
Hui Huang | Yingqi Qu | Jing Liu | Muyun Yang | Bing Xu | Tiejun Zhao | Wenpeng Lu
Findings of the Association for Computational Linguistics: EMNLP 2024

The proliferation of open-source Large Language Models (LLMs) underscores the pressing need for evaluation methods. Existing works primarily rely on external evaluators, focusing on training and prompting strategies. However, a crucial aspect – model-aware glass-box features – is overlooked. In this study, we explore the utility of glass-box features under the scenario of self-evaluation, namely applying an LLM to evaluate its own output. We investigate various glass-box feature groups and discovered that the softmax distribution serves as a reliable quality indicator for self-evaluation. Experimental results on public benchmarks validate the feasibility of self-evaluation of LLMs using glass-box features.

2023

pdf bib
A Thorough Examination on Zero-shot Dense Retrieval
Ruiyang Ren | Yingqi Qu | Jing Liu | Xin Zhao | Qifei Wu | Yuchen Ding | Hua Wu | Haifeng Wang | Ji-Rong Wen
Findings of the Association for Computational Linguistics: EMNLP 2023

Recent years have witnessed the significant advance in dense retrieval (DR) based on powerful pre-trained language models (PLM). DR models have achieved excellent performance in several benchmark datasets, while they are shown to be not as competitive as traditional sparse retrieval models (e.g., BM25) in a zero-shot retrieval setting. However, in the related literature, there still lacks a detailed and comprehensive study on zero-shot retrieval. In this paper, we present the first thorough examination of the zero-shot capability of DR models. We aim to identify the key factors and analyze how they affect zero-shot retrieval performance. In particular, we discuss the effect of several key factors related to source training set, analyze the potential bias from the target dataset, and review and compare existing zero-shot DR models. Our findings provide important evidence to better understand and develop zero-shot DR models.

2022

pdf bib
DuReader-Retrieval: A Large-scale Chinese Benchmark for Passage Retrieval from Web Search Engine
Yifu Qiu | Hongyu Li | Yingqi Qu | Ying Chen | QiaoQiao She | Jing Liu | Hua Wu | Haifeng Wang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

In this paper, we present DuReader-retrieval, a large-scale Chinese dataset for passage retrieval. DuReader-retrieval contains more than 90K queries and over 8M unique passages from a commercial search engine. To alleviate the shortcomings of other datasets and ensure the quality of our benchmark, we (1) reduce the false negatives in development and test sets by manually annotating results pooled from multiple retrievers, and (2) remove the training queries that are semantically similar to the development and testing queries. Additionally, we provide two out-of-domain testing sets for cross-domain evaluation, as well as a set of human translated queries for for cross-lingual retrieval evaluation. The experiments demonstrate that DuReader-retrieval is challenging and a number of problems remain unsolved, such as the salient phrase mismatch and the syntactic mismatch between queries and paragraphs. These experiments also show that dense retrievers do not generalize well across domains, and cross-lingual retrieval is essentially challenging. DuReader-retrieval is publicly available at https://github.com/baidu/DuReader/tree/master/DuReader-Retrieval.

2021

pdf bib
RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering
Yingqi Qu | Yuchen Ding | Jing Liu | Kai Liu | Ruiyang Ren | Wayne Xin Zhao | Daxiang Dong | Hua Wu | Haifeng Wang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In open-domain question answering, dense passage retrieval has become a new paradigm to retrieve relevant passages for finding answers. Typically, the dual-encoder architecture is adopted to learn dense representations of questions and passages for semantic matching. However, it is difficult to effectively train a dual-encoder due to the challenges including the discrepancy between training and inference, the existence of unlabeled positives and limited training data. To address these challenges, we propose an optimized training approach, called RocketQA, to improving dense passage retrieval. We make three major technical contributions in RocketQA, namely cross-batch negatives, denoised hard negatives and data augmentation. The experiment results show that RocketQA significantly outperforms previous state-of-the-art models on both MSMARCO and Natural Questions. We also conduct extensive experiments to examine the effectiveness of the three strategies in RocketQA. Besides, we demonstrate that the performance of end-to-end QA can be improved based on our RocketQA retriever.

pdf bib
PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval
Ruiyang Ren | Shangwen Lv | Yingqi Qu | Jing Liu | Wayne Xin Zhao | QiaoQiao She | Hua Wu | Haifeng Wang | Ji-Rong Wen
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking
Ruiyang Ren | Yingqi Qu | Jing Liu | Wayne Xin Zhao | QiaoQiao She | Hua Wu | Haifeng Wang | Ji-Rong Wen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In various natural language processing tasks, passage retrieval and passage re-ranking are two key procedures in finding and ranking relevant information. Since both the two procedures contribute to the final performance, it is important to jointly optimize them in order to achieve mutual improvement. In this paper, we propose a novel joint training approach for dense passage retrieval and passage reranking. A major contribution is that we introduce the dynamic listwise distillation, where we design a unified listwise training approach for both the retriever and the re-ranker. During the dynamic distillation, the retriever and the re-ranker can be adaptively improved according to each other’s relevance information. We also propose a hybrid data augmentation strategy to construct diverse training instances for listwise training approach. Extensive experiments show the effectiveness of our approach on both MSMARCO and Natural Questions datasets. Our code is available at https://github.com/PaddlePaddle/RocketQA.