Benjamin Piwowarski


pdf bib
Choisir le bon co-équipier pour la génération coopérative de texte (Choosing The Right Teammate For Cooperative Text Generation)
Antoine Chaffin | Vincent Claveau | Ewa Kijak | Sylvain Lamprier | Benjamin Piwowarski | Thomas Scialom | Jacopo Staiano
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Les modèles de langue génèrent des textes en prédisant successivement des distributions de probabilité pour les prochains tokens en fonction des tokens précédents. Pour générer des textes avec des propriétés souhaitées (par ex. être plus naturels, non toxiques ou avoir un style d’écriture spécifique), une solution — le décodage coopératif — consiste à utiliser un classifieur lors de la génération pour guider l’échantillonnage de la distribution du modèle de langue vers des textes ayant la propriété attendue. Dans cet article, nous examinons trois familles de discriminateurs (basés sur des transformers) pour cette tâche de décodage coopératif : les discriminateurs bidirectionnels, unidirectionnels (de gauche à droite) et génératifs. Nous évaluons leurs avantages et inconvénients, en explorant leur précision respective sur des tâches de classification, ainsi que leur impact sur la génération coopérative et leur coût de calcul, dans le cadre d’une stratégie de décodage état de l’art, basée sur une recherche arborescente de Monte-Carlo (MCTS). Nous fournissons également l’implémentation (batchée) utilisée pour nos expériences.


pdf bib
QuestEval: Summarization Asks for Fact-based Evaluation
Thomas Scialom | Paul-Alexis Dray | Sylvain Lamprier | Benjamin Piwowarski | Jacopo Staiano | Alex Wang | Patrick Gallinari
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Summarization evaluation remains an open research problem: current metrics such as ROUGE are known to be limited and to correlate poorly with human judgments. To alleviate this issue, recent work has proposed evaluation metrics which rely on question answering models to assess whether a summary contains all the relevant information in its source document. Though promising, the proposed approaches have so far failed to correlate better than ROUGE with human judgments. In this paper, we extend previous approaches and propose a unified framework, named QuestEval. In contrast to established metrics such as ROUGE or BERTScore, QuestEval does not require any ground-truth reference. Nonetheless, QuestEval substantially improves the correlation with human judgments over four evaluation dimensions (consistency, coherence, fluency, and relevance), as shown in extensive experiments.

pdf bib
Data-QuestEval: A Referenceless Metric for Data-to-Text Semantic Evaluation
Clement Rebuffel | Thomas Scialom | Laure Soulier | Benjamin Piwowarski | Sylvain Lamprier | Jacopo Staiano | Geoffrey Scoutheeten | Patrick Gallinari
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

QuestEval is a reference-less metric used in text-to-text tasks, that compares the generated summaries directly to the source text, by automatically asking and answering questions. Its adaptation to Data-to-Text tasks is not straightforward, as it requires multimodal Question Generation and Answering systems on the considered tasks, which are seldom available. To this purpose, we propose a method to build synthetic multimodal corpora enabling to train multimodal components for a data-QuestEval metric. The resulting metric is reference-less and multimodal; it obtains state-of-the-art correlations with human judgment on the WebNLG and WikiBio benchmarks. We make data-QuestEval’s code and models available for reproducibility purpose, as part of the QuestEval project.

pdf bib
Skim-Attention: Learning to Focus via Document Layout
Laura Nguyen | Thomas Scialom | Jacopo Staiano | Benjamin Piwowarski
Findings of the Association for Computational Linguistics: EMNLP 2021

Transformer-based pre-training techniques of text and layout have proven effective in a number of document understanding tasks. Despite this success, multimodal pre-training models suffer from very high computational and memory costs. Motivated by human reading strategies, this paper presents Skim-Attention, a new attention mechanism that takes advantage of the structure of the document and its layout. Skim-Attention only attends to the 2-dimensional position of the words in a document. Our experiments show that Skim-Attention obtains a lower perplexity than prior works, while being more computationally efficient. Skim-Attention can be further combined with long-range Transformers to efficiently process long documents. We also show how Skim-Attention can be used off-the-shelf as a mask for any Pre-trained Language Model, allowing to improve their performance while restricting attention. Finally, we show the emergence of a document structure representation in Skim-Attention.


pdf bib
MLSUM: The Multilingual Summarization Corpus
Thomas Scialom | Paul-Alexis Dray | Sylvain Lamprier | Benjamin Piwowarski | Jacopo Staiano
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present MLSUM, the first large-scale MultiLingual SUMmarization dataset. Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages – namely, French, German, Spanish, Russian, Turkish. Together with English news articles from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community. We report cross-lingual comparative analyses based on state-of-the-art systems. These highlight existing biases which motivate the use of a multi-lingual dataset.


pdf bib
Incorporating Visual Semantics into Sentence Representations within a Grounded Space
Patrick Bordes | Eloi Zablocki | Laure Soulier | Benjamin Piwowarski | Patrick Gallinari
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Language grounding is an active field aiming at enriching textual representations with visual information. Generally, textual and visual elements are embedded in the same representation space, which implicitly assumes a one-to-one correspondence between modalities. This hypothesis does not hold when representing words, and becomes problematic when used to learn sentence representations — the focus of this paper — as a visual scene can be described by a wide variety of sentences. To overcome this limitation, we propose to transfer visual information to textual representations by learning an intermediate representation space: the grounded space. We further propose two new complementary objectives ensuring that (1) sentences associated with the same visual content are close in the grounded space and (2) similarities between related elements are preserved across modalities. We show that this model outperforms the previous state-of-the-art on classification and semantic relatedness tasks.

pdf bib
Answers Unite! Unsupervised Metrics for Reinforced Summarization Models
Thomas Scialom | Sylvain Lamprier | Benjamin Piwowarski | Jacopo Staiano
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Abstractive summarization approaches based on Reinforcement Learning (RL) have recently been proposed to overcome classical likelihood maximization. RL enables to consider complex, possibly non differentiable, metrics that globally assess the quality and relevance of the generated outputs. ROUGE, the most used summarization metric, is known to suffer from bias towards lexical similarity as well as from sub-optimal accounting for fluency and readability of the generated abstracts. We thus explore and propose alternative evaluation measures: the reported human-evaluation analysis shows that the proposed metrics, based on Question Answering, favorably compare to ROUGE – with the additional property of not requiring reference summaries. Training a RL-based model on these metrics leads to improvements (both in terms of human or automated metrics) over current approaches that use ROUGE as reward.

pdf bib
Unsupervised Information Extraction: Regularizing Discriminative Approaches with Relation Distribution Losses
Étienne Simon | Vincent Guigue | Benjamin Piwowarski
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Unsupervised relation extraction aims at extracting relations between entities in text. Previous unsupervised approaches are either generative or discriminative. In a supervised setting, discriminative approaches, such as deep neural network classifiers, have demonstrated substantial improvement. However, these models are hard to train without supervision, and the currently proposed solutions are unstable. To overcome this limitation, we introduce a skewness loss which encourages the classifier to predict a relation with confidence given a sentence, and a distribution distance loss enforcing that all relations are predicted in average. These losses improve the performance of discriminative based models, and enable us to train deep neural networks satisfactorily, surpassing current state of the art on three different datasets.

pdf bib
Self-Attention Architectures for Answer-Agnostic Neural Question Generation
Thomas Scialom | Benjamin Piwowarski | Jacopo Staiano
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Neural architectures based on self-attention, such as Transformers, recently attracted interest from the research community, and obtained significant improvements over the state of the art in several tasks. We explore how Transformers can be adapted to the task of Neural Question Generation without constraining the model to focus on a specific answer passage. We study the effect of several strategies to deal with out-of-vocabulary words such as copy mechanisms, placeholders, and contextual word embeddings. We report improvements obtained over the state-of-the-art on the SQuAD dataset according to automated metrics (BLEU, ROUGE), as well as qualitative human assessments of the system outputs.