Gabriel Luthier


2022

pdf bib
Constrained Language Models for Interactive Poem Generation
Andrei Popescu-Belis | Àlex Atrio | Valentin Minder | Aris Xanthos | Gabriel Luthier | Simon Mattei | Antonio Rodriguez
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper describes a system for interactive poem generation, which combines neural language models (LMs) for poem generation with explicit constraints that can be set by users on form, topic, emotion, and rhyming scheme. LMs cannot learn such constraints from the data, which is scarce with respect to their needs even for a well-resourced language such as French. We propose a method to generate verses and stanzas by combining LMs with rule-based algorithms, and compare several approaches for adjusting the words of a poem to a desired combination of topics or emotions. An approach to automatic rhyme setting using a phonetic dictionary is proposed as well. Our system has been demonstrated at public events, and log analysis shows that users found it engaging.

2021

pdf bib
The IICT-Yverdon System for the WMT 2021 Unsupervised MT and Very Low Resource Supervised MT Task
Àlex R. Atrio | Gabriel Luthier | Axel Fahy | Giorgos Vernikos | Andrei Popescu-Belis | Ljiljana Dolamic
Proceedings of the Sixth Conference on Machine Translation

In this paper, we present the systems submitted by our team from the Institute of ICT (HEIG-VD / HES-SO) to the Unsupervised MT and Very Low Resource Supervised MT task. We first study the improvements brought to a baseline system by techniques such as back-translation and initialization from a parent model. We find that both techniques are beneficial and suffice to reach performance that compares with more sophisticated systems from the 2020 task. We then present the application of this system to the 2021 task for low-resource supervised Upper Sorbian (HSB) to German translation, in both directions. Finally, we present a contrastive system for HSB-DE in both directions, and for unsupervised German to Lower Sorbian (DSB) translation, which uses multi-task training with various training schedules to improve over the baseline.

2020

pdf bib
A Consolidated Dataset for Knowledge-based Question Generation using Predicate Mapping of Linked Data
Johanna Melly | Gabriel Luthier | Andrei Popescu-Belis
Proceedings of the 16th Joint ACL-ISO Workshop on Interoperable Semantic Annotation

In this paper, we present the ForwardQuestions data set, made of human-generated questions related to knowledge triples. This data set results from the conversion and merger of the existing SimpleDBPediaQA and SimpleQuestionsWikidata data sets, including the mapping of predicates from DBPedia to Wikidata, and the selection of ‘forward’ questions as opposed to ‘backward’ ones. The new data set can be used to generate novel questions given an unseen Wikidata triple, by replacing the subjects of existing questions with the new one and then selecting the best candidate questions using semantic and syntactic criteria. Evaluation results indicate that the question generation method using ForwardQuestions improves the quality of questions by about 20% with respect to a baseline not using ranking criteria.

pdf bib
Chat or Learn: a Data-Driven Robust Question-Answering System
Gabriel Luthier | Andrei Popescu-Belis
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present a voice-based conversational agent which combines the robustness of chatbots and the utility of question answering (QA) systems. Indeed, while data-driven chatbots are typically user-friendly but not goal-oriented, QA systems tend to perform poorly at chitchat. The proposed chatbot relies on a controller which performs dialogue act classification and feeds user input either to a sequence-to-sequence chatbot or to a QA system. The resulting chatbot is a spoken QA application for the Google Home smart speaker. The system is endowed with general-domain knowledge from Wikipedia articles and uses coreference resolution to detect relatedness between questions. We present our choices of data sets for training and testing the components, and present the experimental results that helped us optimize the parameters of the chatbot. In particular, we discuss the appropriateness of using the SQuAD dataset for evaluating end-to-end QA, in the light of our system’s behavior.