Olivier Bachem


2023

pdf bib
Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Paul Roit | Johan Ferret | Lior Shani | Roee Aharoni | Geoffrey Cideron | Robert Dadashi | Matthieu Geist | Sertan Girgin | Leonard Hussenot | Orgad Keller | Nikola Momchev | Sabela Ramos Garea | Piotr Stanczyk | Nino Vieillard | Olivier Bachem | Gal Elidan | Avinatan Hassidim | Olivier Pietquin | Idan Szpektor
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite the seeming success of contemporary grounded text generation systems, they often tend to generate factually inconsistent text with respect to their input. This phenomenon is emphasized in tasks like summarization, in which the generated summaries should be corroborated by their source article. In this work we leverage recent progress on textual entailment models to directly address this problem for abstractive summarization systems. We use reinforcement learning with reference-free, textual-entailment rewards to optimize for factual consistency and explore the ensuing trade-offs, as improved consistency may come at the cost of less informative or more extractive summaries. Our results, according to both automatic metrics and human evaluation, show that our method considerably improves the faithfulness, salience and conciseness of the generated summaries.

2022

pdf bib
Decoding a Neural Retriever’s Latent Space for Query Suggestion
Leonard Adolphs | Michelle Chen Huebscher | Christian Buck | Sertan Girgin | Olivier Bachem | Massimiliano Ciaramita | Thomas Hofmann
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Neural retrieval models have superseded classic bag-of-words methods such as BM25 as the retrieval framework of choice. However, neural systems lack the interpretability of bag-of-words models; it is not trivial to connect a query change to a change in the latent space that ultimately determines the retrieval results. To shed light on this embedding space, we learn a “query decoder” that, given a latent representation of a neural search engine, generates the corresponding query. We show that it is possible to decode a meaningful query from its latent representation and, when moving in the right direction in latent space, to decode a query that retrieves the relevant paragraph. In particular, the query decoder can be useful to understand “what should have been asked” to retrieve a particular paragraph from the collection. We employ the query decoder to generate a large synthetic dataset of query reformulations for MSMarco, leading to improved retrieval performance. On this data, we train a pseudo-relevance feedback (PRF) T5 model for the application of query suggestion that outperforms both query reformulation and PRF information retrieval baselines.