2024
pdf
bib
abs
Améliorer la traduction au niveau du document grâce au sur-echantillage négatif et au masquage ciblé
Gaëtan Caillaut
|
Mariam Nakhlé
|
Jingshu Liu
|
Raheel Qader
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position
Ces travaux visent à améliorer les capacités des systèmes de traduction automatique à tenir compte du contexte dans lequel se trouve la phrase source, et donc, ultimement, à améliorer les performances globales des systèmes de traduction automatique. L’approche que nous proposons repose uniquement sur les données et la manière dont elles sont fournies au modèle durant l’entraînement et est complètement agnostique de l’architecture du modèle. Nous montrons que les performances des modèles de traduction, sur la paire en-fr, peuvent être améliorées simplement en fournissant des données plus pertinentes vis-à-vis de la tâche cible, et ce sans modifier ni complexifier les architectures existantes, en particulier l’architecture Transformer couramment utilisée par les systèmes de TAL modernes. Pour ce faire, nous présentons deux stratégies d’augmentation de données (sur-échantillonnage négatif et masquage ciblé) conçues pour inciter le modèle à s’appuyer sur le contexte. Nous montrons, au travers de métriques appropriées, que ces méthodes permettent d’améliorer les performances des systèmes de traduction sans pour autant modifier ni l’architecture du modèle, ni le processus d’entraînement.
pdf
bib
abs
Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task
Gaëtan Caillaut
|
Mariam Nakhlé
|
Raheel Qader
|
Jingshu Liu
|
Jean-Gabriel Barthélemy
Proceedings of the Ninth Conference on Machine Translation
Recent studies have showcased remarkable capabilities of decoder-only models in many NLP tasks, including translation. Yet, the machine translation field has been largely dominated by encoder-decoder models based on the Transformer architecture. As a consequence, scaling laws of encoder-decoder models for neural machine translation have already been well studied, but decoder-only models have received less attention.This work explores the scaling laws of decoder-only models on the multilingual and multidomain translation task. We trained a collection of six decoder-only models, ranging from 70M to 7B parameters, on a sentence-level, multilingual (8 languages) and multidomain (9 domains) dataset. We conducted a series of experiments showing that the loss of decoder-only models can be estimated using a scaling law similar to the one discovered for large language models, but we also show that this scaling law has difficulties to generalize to too large models or to a different data distribution. We also study different scaling methods and show that scaling the depth and the width of a model lead to similar test loss improvements, but with different impact on the model’s efficiency.
2023
pdf
bib
abs
Large Language Model Adaptation for Financial Sentiment Analysis
Pau Rodriguez Inserte
|
Mariam Nakhlé
|
Raheel Qader
|
Gaetan Caillaut
|
Jingshu Liu
Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing
Natural language processing (NLP) has recently gained relevance within financial institutions by providing highly valuable insights into companies and markets’ financial documents. However, the landscape of the financial domain presents extra challenges for NLP, due to the complexity of the texts and the use of specific terminology. Generalist language models tend to fall short in tasks specifically tailored for finance, even when using large language models (LLMs) with great natural language understanding and generative capabilities. This paper presents a study on LLM adaptation methods targeted at the financial domain and with high emphasis on financial sentiment analysis. To this purpose, two foundation models with less than 1.5B parameters have been adapted using a wide range of strategies. We show that through careful fine-tuning on both financial documents and instructions, these foundation models can be adapted to the target domain. Moreover, we observe that small LLMs have comparable performance to larger scale models, while being more efficient in terms of parameters and data. In addition to the models, we show how to generate artificial instructions through LLMs to augment the number of samples of the instruction dataset.