Pat Verga

2025

Multilingual Arbitration: Optimizing Data Pools to Accelerate Multilingual Progress
Ayomide Odumakinde | Daniel D’souza | Pat Verga | Beyza Ermis | Sara Hooker
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Synthetic data has driven recent state-of-the-art advancements, but reliance on a single oracle teacher model can lead to model collapse and bias propagation. These issues are particularly severe in multilingual settings, where no single model excels across all languages. In this study, we propose multilingual arbitration, which exploits performance variations among multiple models for each language. By strategically routing samples through a diverse set of models, each with unique strengths, we mitigate these challenges and enhance multilingual performance. Extensive experiments with state-of-the-art models demonstrate that our approach significantly surpasses single-teacher distillation, achieving up to 80% win rates over proprietary and open-weight models like Gemma 2, Llama 3.1, and Mistral v0.3, with the largest improvements in low-resource languages.

pdf bib abs

FLARE: Faithful Logic-Aided Reasoning and Exploration
Erik Arakelyan | Pasquale Minervini | Patrick Lewis | Pat Verga | Isabelle Augenstein
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Modern Question Answering (QA) and Reasoning approaches with Large Language Models (LLMs) commonly use Chain-of-Thought (CoT) prompting but struggle with generating outputs faithful to their intermediate reasoning chains. While neuro-symbolic methods like Faithful CoT (F-CoT) offer higher faithfulness through external solvers, they require code-specialized models and struggle with ambiguous tasks.We introduce Faithful Logic-Aided Reasoning and Exploration (FLARE), which uses LLMs to plan solutions, formalize queries into logic programs, and simulate code execution through multi-hop search without external solvers. Our method achieves SOTA results on 𝟕 out of 𝟗 diverse reasoning benchmarks and 3 out of 3 logic inference benchmarks while enabling measurement of reasoning faithfulness. We demonstrate that model faithfulness correlates with performance and that successful reasoning traces show an 18.1% increase in unique emergent facts, 8.6% higher overlap between code-defined and execution-trace relations, and 3.6% reduction in unused relations.

2023

pdf bib abs

To Adapt or to Annotate: Challenges and Interventions for Domain Adaptation in Open-Domain Question Answering
Dheeru Dua | Emma Strubell | Sameer Singh | Pat Verga
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent advances in open-domain question answering (ODQA) have demonstrated impressive accuracy on general-purpose domains like Wikipedia. While some work has been investigating how well ODQA models perform when tested for out-of-domain (OOD) generalization, these studies have been conducted only under conservative shifts in data distribution and typically focus on a single component (i.e., retriever or reader) rather than an end-to-end system. This work proposes a more realistic end-to-end domain shift evaluation setting covering five diverse domains. We not only find that end-to-end models fail to generalize but that high retrieval scores often still yield poor answer prediction accuracy. To address these failures, we investigate several interventions, in the form of data augmentations, for improving model adaption and use our evaluation set to elucidate the relationship between the efficacy of an intervention scheme and the particular type of dataset shifts we consider. We propose a generalizability test that estimates the type of shift in a target dataset without training a model in the target domain and that the type of shift is predictive of which data augmentation schemes will be effective for domain adaption. Overall, we find that these interventions increase end-to-end performance by up to ~24 points.

pdf bib abs

Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering
Wenhu Chen | Pat Verga | Michiel de Jong | John Wieting | William W. Cohen
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Existing state-of-the-art methods for open-domain question-answering (ODQA) use an open book approach in which information is first retrieved from a large text corpus or knowledge base (KB) and then reasoned over to produce an answer. A recent alternative is to retrieve from a collection of previously-generated question-answer pairs; this has several practical advantages including being more memory and compute-efficient. Question-answer pairs are also appealing in that they can be viewed as an intermediate between text and KB triples: like KB triples, they often concisely express a single relationship, but like text, have much higher coverage than traditional KBs. In this work, we describe a new QA system that augments a text-to-text model with a large memory of question-answer pairs, and a new pre-training task for the latent step of question retrieval. The pre-training task substantially simplifies training and greatly improves performance on smaller QA benchmarks. Unlike prior systems of this sort, our QA system can also answer multi-hop questions that do not explicitly appear in the collection of stored question-answer pairs.

2022

pdf bib abs

MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text
Wenhu Chen | Hexiang Hu | Xi Chen | Pat Verga | William Cohen
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

While language Models store a massive amount of world knowledge implicitly in their parameters, even very large models often fail to encode information about rare entities and events, while incurring huge computational costs. Recently, retrieval-augmented models, such as REALM, RAG, and RETRO, have incorporated world knowledge into language generation by leveraging an external non-parametric index and have demonstrated impressive performance with constrained model sizes. However, these methods are restricted to retrieving only textual knowledge, neglecting the ubiquitous amount of knowledge in other modalities like images – much of which contains information not covered by any text. To address this limitation, we propose the first Multimodal Retrieval-Augmented Transformer (MuRAG), which accesses an external non-parametric multimodal memory to augment language generation. MuRAG is pre-trained with a mixture of large-scale image-text and text-only corpora using a joint contrastive and generative loss. We perform experiments on two different datasets that require retrieving and reasoning over both images and text to answer a given query: WebQA, and MultimodalQA. Our results show that MuRAG achieves state-of-the-art accuracy, outperforming existing models by 10-20% absolute on both datasets and under both distractor and full-wiki settings.

pdf bib abs

Faithful to the Document or to the World? Mitigating Hallucinations via Entity-Linked Knowledge in Abstractive Summarization
Yue Dong | John Wieting | Pat Verga
Findings of the Association for Computational Linguistics: EMNLP 2022

Existing abstractive summarization systems are hampered by content hallucinations in which models generate text that is not directly inferable from the source alone. Annotations from prior work have shown that some of these hallucinations, while being ‘unfaithful’ to the source, are nonetheless factual. Our analysis in this paper suggests that these factual hallucinations occur as a result of the prevalence of factual yet unfaithful entities in summarization datasets. We find that these entities are not aberrations, but instead examples of additional world knowledge being readily used to latently connect entities and concepts – in this case connecting entities in the source document to those in the target summary. In our analysis and experiments, we demonstrate that connecting entities to an external knowledge base can lend provenance to many of these unfaithful yet factual entities, and further, this knowledge can be used to improve the factuality of summaries without simply making them more extractive.

2021

pdf bib abs

Adaptable and Interpretable Neural MemoryOver Symbolic Knowledge
Pat Verga | Haitian Sun | Livio Baldini Soares | William Cohen
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Past research has demonstrated that large neural language models (LMs) encode surprising amounts of factual information: however, augmenting or modifying this information requires modifying a corpus and retraining, which is computationally expensive. To address this problem, we develop a neural LM that includes an interpretable neuro-symbolic KB in the form of a “fact memory”. Each element of the fact memory is formed from a triple of vectors, where each vector corresponds to a KB entity or relation. Our LM improves performance on knowledge-intensive question-answering tasks, sometimes dramatically, including a 27 point increase in one setting of WebQuestionsSP over a state-of-the-art open-book model, despite using 5% of the parameters. Most interestingly, we demonstrate that the model can be modified, without any re-training, by updating the fact memory.

Co-authors

Venues

Fix author