Baptiste Pannier
2022
PAGnol: An Extra-Large French Generative Model
Julien Launay
|
E.l. Tommasone
|
Baptiste Pannier
|
François Boniface
|
Amélie Chatelain
|
Alessandro Cappelli
|
Iacopo Poli
|
Djamé Seddah
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Access to large pre-trained models of varied architectures, in many different languages, is central to the democratization of NLP. We introduce PAGnol, a collection of French GPT models. Using scaling laws, we efficiently train PAGnol-XL (1.5B parameters) with the same computational budget as CamemBERT, a model 13 times smaller. PAGnol-XL is the largest model trained from scratch for the French language. We plan to train increasingly large and performing versions of PAGnol, exploring the capabilities of French extreme-scale models. For this first release, we focus on the pre-training and scaling calculations underlining PAGnol. We fit a scaling law for compute for the French language, and compare it with its English counterpart. We find the pre-training dataset significantly conditions the quality of the outputs, with common datasets such as OSCAR leading to low-quality offensive text. We evaluate our models on discriminative and generative tasks in French, comparing to other state-of-the-art French and multilingual models, and reaching the state of the art in the abstract summarization task. Our research was conducted on the public GENCI Jean Zay supercomputer, and our models up to the Large are made publicly available.
2020
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning
Alexandre Tamborrino
|
Nicola Pellicanò
|
Baptiste Pannier
|
Pascal Voitot
|
Louise Naudin
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. Most of the existing approaches rely on a randomly initialized classifier on top of such networks. We argue that this fine-tuning procedure is sub-optimal as the pre-trained model has no prior on the specific classifier labels, while it might have already learned an intrinsic textual representation of the task. In this paper, we introduce a new scoring method that casts a plausibility ranking task in a full-text format and leverages the masked language modeling head tuned during the pre-training phase. We study commonsense reasoning tasks where the model must rank a set of hypotheses given a premise, focusing on the COPA, Swag, HellaSwag and CommonsenseQA datasets. By exploiting our scoring method without fine-tuning, we are able to produce strong baselines (e.g. 80% test accuracy on COPA) that are comparable to supervised approaches. Moreover, when fine-tuning directly on the proposed scoring function, we show that our method provides a much more stable training phase across random restarts (e.g x10 standard deviation reduction on COPA test accuracy) and requires less annotated data than the standard classifier approach to reach equivalent performances.
Search
Co-authors
- Alexandre Tamborrino 1
- Nicola Pellicanò 1
- Pascal Voitot 1
- Louise Naudin 1
- Julien Launay 1
- show all...