Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Paul Roit | Johan Ferret | Lior Shani | Roee Aharoni | Geoffrey Cideron | Robert Dadashi | Matthieu Geist | Sertan Girgin | Leonard Hussenot | Orgad Keller | Nikola Momchev | Sabela Ramos Garea | Piotr Stanczyk | Nino Vieillard | Olivier Bachem | Gal Elidan | Avinatan Hassidim | Olivier Pietquin | Idan Szpektor
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite the seeming success of contemporary grounded text generation systems, they often tend to generate factually inconsistent text with respect to their input. This phenomenon is emphasized in tasks like summarization, in which the generated summaries should be corroborated by their source article. In this work we leverage recent progress on textual entailment models to directly address this problem for abstractive summarization systems. We use reinforcement learning with reference-free, textual-entailment rewards to optimize for factual consistency and explore the ensuing trade-offs, as improved consistency may come at the cost of less informative or more extractive summaries. Our results, according to both automatic metrics and human evaluation, show that our method considerably improves the faithfulness, salience and conciseness of the generated summaries.


Learning Natural Language Generation with Truncated Reinforcement Learning
Alice Martin | Guillaume Quispe | Charles Ollion | Sylvain Le Corff | Florian Strub | Olivier Pietquin
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

This paper introduces TRUncated ReinForcement Learning for Language (TrufLL), an original approach to train conditional languagemodels without a supervised learning phase, by only using reinforcement learning (RL). As RL methods unsuccessfully scale to large action spaces, we dynamically truncate the vocabulary space using a generic language model. TrufLL thus enables to train a language agent by solely interacting with its environment without any task-specific prior knowledge; it is only guided with a task-agnostic language model. Interestingly, this approach avoids the dependency to labelled datasets and inherently reduces pretrained policy flaws such as language or exposure biases. We evaluate TrufLL on two visual question generation tasks, for which we report positive results over performance and language metrics, which we then corroborate with a human evaluation. To our knowledge, it is the first approach that successfully learns a language generation policy without pre-training, using only reinforcement learning.


Supervised Seeded Iterated Learning for Interactive Language Learning
Yuchen Lu | Soumye Singhal | Florian Strub | Olivier Pietquin | Aaron Courville
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Language drift has been one of the major obstacles to train language models through interaction. When word-based conversational agents are trained towards completing a task, they tend to invent their language rather than leveraging natural language. In recent literature, two general methods partially counter this phenomenon: Supervised Selfplay (S2P) and Seeded Iterated Learning (SIL). While S2P jointly trains interactive and supervised losses to counter the drift, SIL changes the training dynamics to prevent language drift from occurring. In this paper, we first highlight their respective weaknesses, i.e., late-stage training collapses and higher negative likelihood when evaluated on human corpus. Given these observations, we introduce Supervised Seeded Iterated Learning (SSIL) to combine both methods to minimize their respective weaknesses. We then show the effectiveness of in the language-drift translation game.

Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Olivier Pietquin | Smaranda Muresan | Vivian Chen | Casey Kennington | David Vandyke | Nina Dethlefs | Koji Inoue | Erik Ekstedt | Stefan Ultes
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue


LIG-CRIStAL Submission for the WMT 2017 Automatic Post-Editing Task
Alexandre Bérard | Laurent Besacier | Olivier Pietquin
Proceedings of the Second Conference on Machine Translation


MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP
Alexandre Bérard | Christophe Servan | Olivier Pietquin | Laurent Besacier
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes word2vec’s features, paragraph vector (batch and online) and bivec for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification.


Human-Machine Dialogue as a Stochastic Game
Merwan Barlier | Julien Perolat | Romain Laroche | Olivier Pietquin
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue


NASTIA: Negotiating Appointment Setting Interface
Layla El Asri | Rémi Lemonnier | Romain Laroche | Olivier Pietquin | Hatim Khouzaimi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper describes a French Spoken Dialogue System (SDS) named NASTIA (Negotiating Appointment SeTting InterfAce). Appointment scheduling is a hybrid task halfway between slot-filling and negotiation. NASTIA implements three different negotiation strategies. These strategies were tested on 1734 dialogues with 385 users who interacted at most 5 times with the SDS and gave a rating on a scale of 1 to 10 for each dialogue. Previous appointment scheduling systems were evaluated with the same experimental protocol. NASTIA is different from these systems in that it can adapt its strategy during the dialogue. The highest system task completion rate with these systems was 81% whereas NASTIA had an 88% average and its best performing strategy even reached 92%. This strategy also significantly outperformed previous systems in terms of overall user rating with an average of 8.28 against 7.40. The experiment also enabled highlighting global recommendations for building spoken dialogue systems.

DINASTI: Dialogues with a Negotiating Appointment Setting Interface
Layla El Asri | Romain Laroche | Olivier Pietquin
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper describes the DINASTI (DIalogues with a Negotiating Appointment SeTting Interface) corpus, which is composed of 1734 dialogues with the French spoken dialogue system NASTIA (Negotiating Appointment SeTting InterfAce). NASTIA is a reinforcement learning-based system. The DINASTI corpus was collected while the system was following a uniform policy. Each entry of the corpus is a system-user exchange annotated with 120 automatically computable features. The corpus contains a total of 21587 entries, with 385 testers. Each tester performed at most five scenario-based interactions with NASTIA. The dialogues last an average of 10.82 dialogue turns, with 4.45 reinforcement learning decisions. The testers filled an evaluation questionnaire after each dialogue. The questionnaire includes three questions to measure task completion. In addition, it comprises 7 Likert-scaled items evaluating several aspects of the interaction, a numerical overall evaluation on a scale of 1 to 10, and a free text entry. Answers to this questionnaire are provided with DINASTI. This corpus is meant for research on reinforcement learning modelling for dialogue management.


Model-free POMDP optimisation of tutoring systems with echo-state networks
Lucie Daubigney | Matthieu Geist | Olivier Pietquin
Proceedings of the SIGDIAL 2013 Conference


Optimisation d’un tuteur intelligent à partir d’un jeu de données fixé (Optimization of a tutoring system from a fixed set of data) [in French]
Lucie Daubigney | Matthieu Geist | Olivier Pietquin
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP

Statistical User Simulation for Spoken Dialogue Systems: What for, Which Data, Which Future?
Olivier Pietquin
NAACL-HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD 2012)


Training a BN-based user model for dialogue simulation with missing data
Stéphane Rossignol | Olivier Pietquin | Michel Ianotto
Proceedings of 5th International Joint Conference on Natural Language Processing


Sparse Approximate Dynamic Programming for Dialog Management
Senthilkumar Chandramohan | Matthieu Geist | Olivier Pietquin
Proceedings of the SIGDIAL 2010 Conference


Réseau bayesien pour un modèle d’utilisateur et un module de compréhension pour l’optimisation des systèmes de dialogues
Olivier Pietquin
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Dans cet article, un environnement modulaire pour la simulation automatique de dialogues homme-machine est proposé. Cet environnement comprend notamment un modèle d’utilisateur consistant dirigé par le but et un module de simulation de compréhension de parole. Un réseau bayésien est à la base de ces deux modèles et selon les paramètres utilisés, il peut générer un comportement d’utilisateur cohérent ou servir de classificateur de concepts. L’environnement a été utilisé dans le contexte de l’optimisation de stratégies de dialogue sur une tâche simple de remplissage de formulaire et les résultats montrent qu’il est alors possible d’identifier certains dialogues problématiques du point de vue de la compréhension.