Jon Saad-Falcon


pdf bib
Embedding Recycling for Language Models
Jon Saad-Falcon | Amanpreet Singh | Luca Soldaini | Mike D’Arcy | Arman Cohan | Doug Downey
Findings of the Association for Computational Linguistics: EACL 2023

Real-world applications of neural language models often involve running many different models over the same corpus. The high computational cost of these runs has led to interest in techniques that can reuse the contextualized embeddings produced in previous runs to speed training and inference of future ones. We refer to this approach as embedding recycling (ER). While multiple ER techniques have been proposed, their practical effectiveness is still unknown because existing evaluations consider very few models and do not adequately account for overhead costs. We perform an extensive evaluation of ER across eight different models (17 to 900 million parameters) and fourteen tasks in English. We show how a simple ER technique that caches activations from an intermediate layer of a pretrained model, and learns task-specific adapters on the later layers, is broadly effective. For the best-performing baseline in our experiments (DeBERTa-v2 XL), adding a precomputed cache results in a 90% speedup during training and 87-91% speedup for inference, with negligible impact on accuracy. Our analysis reveals important areas of future work.

pdf bib
Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking
Keshav Santhanam | Jon Saad-Falcon | Martin Franz | Omar Khattab | Avi Sil | Radu Florian | Md Arafat Sultan | Salim Roukos | Matei Zaharia | Christopher Potts
Findings of the Association for Computational Linguistics: ACL 2023

Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the popular IR benchmarks today focus exclusively on downstream task accuracy and thus conceal the costs incurred by systems that trade away efficiency for quality. Latency, hardware cost, and other efficiency considerations are paramount to the deployment of IR systems in user-facing settings. We propose that IR benchmarks structure their evaluation methodology to include not only metrics of accuracy, but also efficiency considerations such as a query latency and the corresponding cost budget for a reproducible hardware setting. For the popular IR benchmarks MS MARCO and XOR-TyDi, we show how the best choice of IR system varies according to how these efficiency considerations are chosen and weighed. We hope that future benchmarks will adopt these guidelines toward more holistic IR evaluation.

pdf bib
UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers
Jon Saad-Falcon | Omar Khattab | Keshav Santhanam | Radu Florian | Martin Franz | Salim Roukos | Avirup Sil | Md Sultan | Christopher Potts
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Many information retrieval tasks require large labeled datasets for fine-tuning. However, such datasets are often unavailable, and their utility for real-world applications can diminish quickly due to domain shifts. To address this challenge, we develop and motivate a method for using large language models (LLMs) to generate large numbers of synthetic queries cheaply. The method begins by generating a small number of synthetic queries using an expensive LLM. After that, a much less expensive one is used to create large numbers of synthetic queries, which are used to fine-tune a family of reranker models. These rerankers are then distilled into a single efficient retriever for use in the target domain. We show that this technique boosts zero-shot accuracy in long-tail domains and achieves substantially lower latency than standard reranking methods.


pdf bib
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
Keshav Santhanam | Omar Khattab | Jon Saad-Falcon | Christopher Potts | Matei Zaharia
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Neural information retrieval (IR) has greatly advanced search and other knowledge-intensive language tasks. While many neural IR methods encode queries and documents into single-vector representations, late interaction models produce multi-vector representations at the granularity of each token and decompose relevance modeling into scalable token-level computations. This decomposition has been shown to make late interaction more effective, but it inflates the space footprint of these models by an order of magnitude. In this work, we introduce ColBERTv2, a retriever that couples an aggressive residual compression mechanism with a denoised supervision strategy to simultaneously improve the quality and space footprint of late interaction. We evaluate ColBERTv2 across a wide range of benchmarks, establishing state-of-the-art quality within and outside the training domain while reducing the space footprint of late interaction models by 6–10x.


pdf bib
Examining the Ordering of Rhetorical Strategies in Persuasive Requests
Omar Shaikh | Jiaao Chen | Jon Saad-Falcon | Polo Chau | Diyi Yang
Findings of the Association for Computational Linguistics: EMNLP 2020

Interpreting how persuasive language influences audiences has implications across many domains like advertising, argumentation, and propaganda. Persuasion relies on more than a message’s content. Arranging the order of the message itself (i.e., ordering specific rhetorical strategies) also plays an important role. To examine how strategy orderings contribute to persuasiveness, we first utilize a Variational Autoencoder model to disentangle content and rhetorical strategies in textual requests from a large-scale loan request corpus. We then visualize interplay between content and strategy through an attentional LSTM that predicts the success of textual requests. We find that specific (orderings of) strategies interact uniquely with a request’s content to impact success rate, and thus the persuasiveness of a request.