Revisiting Query Variation Robustness of Transformer Models

Tim Hagen; Harrisen Scells; Martin Potthast

doi:10.18653/v1/2024.findings-emnlp.248

Revisiting Query Variation Robustness of Transformer Models

Tim Hagen, Harrisen Scells, Martin Potthast

Abstract

The most commonly used transformers for retrieval at present, BERT and T5, have been shown not to be robust to query variations such as typos or paraphrases. Although this is an important prerequisite for their practicality, this problem has hardly been investigated. More recent large language models (LLMs), including instruction-tuned LLMs, have not been analyzed yet, and only one study looks beyond typos. We close this gap by reproducing this study and extending it with a systematic analysis of more recent models, including Sentence-BERT, CharacterBERT, E5-Mistral, AnglE, and Ada v2. We further investigate if instruct-LLMs can be prompted for robustness. Our results are mixed in that the previously observed robustness issues for cross-encoders also apply to bi-encoders that use much larger LLMs, albeit to a lesser extent. While further LLM scaling may improve their embeddings, their cost-effective use for all but large deployments is limited. Training data that includes query variations allows LLMs to be fine-tuned for more robustness, but focusing on a single category of query variation may even degrade the effectiveness on others. Our code, results, and artifacts can be found at https://github.com/webis-de/EMNLP-24

Anthology ID:: 2024.findings-emnlp.248
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4283–4296
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.248/
DOI:: 10.18653/v1/2024.findings-emnlp.248
Bibkey:
Cite (ACL):: Tim Hagen, Harrisen Scells, and Martin Potthast. 2024. Revisiting Query Variation Robustness of Transformer Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 4283–4296, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Revisiting Query Variation Robustness of Transformer Models (Hagen et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-emnlp.248.pdf

PDF Cite Search Fix data