Àlex R. Atrio

Also published as: Àlex Atrio


pdf bib
GPoeT: a Language Model Trained for Rhyme Generation on Synthetic Data
Andrei Popescu-Belis | Àlex R. Atrio | Bastien Bernath | Etienne Boisson | Teo Ferrari | Xavier Theimer-Lienhard | Giorgos Vernikos
Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Poem generation with language models requires the modeling of rhyming patterns. We propose a novel solution for learning to rhyme, based on synthetic data generated with a rule-based rhyming algorithm. The algorithm and an evaluation metric use a phonetic dictionary and the definitions of perfect and assonant rhymes. We fine-tune a GPT-2 English model with 124M parameters on 142 MB of natural poems and find that this model generates consecutive rhymes infrequently (11%). We then fine-tune the model on 6 MB of synthetic quatrains with consecutive rhymes (AABB) and obtain nearly 60% of rhyming lines in samples generated by the model. Alternating rhymes (ABAB) are more difficult to model because of longer-range dependencies, but they are still learnable from synthetic data, reaching 45% of rhyming lines in generated samples.

pdf bib
A Simplified Training Pipeline for Low-Resource and Unsupervised Machine Translation
Àlex R. Atrio | Alexis Allemann | Ljiljana Dolamic | Andrei Popescu-Belis
Proceedings of the The Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)

Training neural MT systems for low-resource language pairs or in unsupervised settings (i.e. with no parallel data) often involves a large number of auxiliary systems. These may include parent systems trained on higher-resource pairs and used for initializing the parameters of child systems, multilingual systems for neighboring languages, and several stages of systems trained on pseudo-parallel data obtained through back-translation. We propose here a simplified pipeline, which we compare to the best submissions to the WMT 2021 Shared Task on Unsupervised MT and Very Low Resource Supervised MT. Our pipeline only needs two parents, two children, one round of back-translation for low-resource directions and two for unsupervised ones and obtains better or similar scores when compared to more complex alternatives.


pdf bib
Constrained Language Models for Interactive Poem Generation
Andrei Popescu-Belis | Àlex Atrio | Valentin Minder | Aris Xanthos | Gabriel Luthier | Simon Mattei | Antonio Rodriguez
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper describes a system for interactive poem generation, which combines neural language models (LMs) for poem generation with explicit constraints that can be set by users on form, topic, emotion, and rhyming scheme. LMs cannot learn such constraints from the data, which is scarce with respect to their needs even for a well-resourced language such as French. We propose a method to generate verses and stanzas by combining LMs with rule-based algorithms, and compare several approaches for adjusting the words of a poem to a desired combination of topics or emotions. An approach to automatic rhyme setting using a phonetic dictionary is proposed as well. Our system has been demonstrated at public events, and log analysis shows that users found it engaging.

pdf bib
On the Interaction of Regularization Factors in Low-resource Neural Machine Translation
Àlex R. Atrio | Andrei Popescu-Belis
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

We explore the roles and interactions of the hyper-parameters governing regularization, and propose a range of values applicable to low-resource neural machine translation. We demonstrate that default or recommended values for high-resource settings are not optimal for low-resource ones, and that more aggressive regularization is needed when resources are scarce, in proportion to their scarcity. We explain our observations by the generalization abilities of sharp vs. flat basins in the loss landscape of a neural network. Results for four regularization factors corroborate our claim: batch size, learning rate, dropout rate, and gradient clipping. Moreover, we show that optimal results are obtained when using several of these factors, and that our findings generalize across datasets of different sizes and languages.


pdf bib
The IICT-Yverdon System for the WMT 2021 Unsupervised MT and Very Low Resource Supervised MT Task
Àlex R. Atrio | Gabriel Luthier | Axel Fahy | Giorgos Vernikos | Andrei Popescu-Belis | Ljiljana Dolamic
Proceedings of the Sixth Conference on Machine Translation

In this paper, we present the systems submitted by our team from the Institute of ICT (HEIG-VD / HES-SO) to the Unsupervised MT and Very Low Resource Supervised MT task. We first study the improvements brought to a baseline system by techniques such as back-translation and initialization from a parent model. We find that both techniques are beneficial and suffice to reach performance that compares with more sophisticated systems from the 2020 task. We then present the application of this system to the 2021 task for low-resource supervised Upper Sorbian (HSB) to German translation, in both directions. Finally, we present a contrastive system for HSB-DE in both directions, and for unsupervised German to Lower Sorbian (DSB) translation, which uses multi-task training with various training schedules to improve over the baseline.

pdf bib
Small Batch Sizes Improve Training of Low-Resource Neural MT
Àlex Atrio | Andrei Popescu-Belis
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

We study the role of an essential hyper-parameter that governs the training of Transformers for neural machine translation in a low-resource setting: the batch size. Using theoretical insights and experimental evidence, we argue against the widespread belief that batch size should be set as large as allowed by the memory of the GPUs. We show that in a low-resource setting, a smaller batch size leads to higher scores in a shorter training time, and argue that this is due to better regularization of the gradients during training.