Luc Lamontagne


2024

pdf bib
SMARTR: A Framework for Early Detection using Survival Analysis of Longitudinal Texts
Jean-Thomas Baillargeon | Luc Lamontagne
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)

This paper presents an innovative approach to the early detection of expensive insurance claims by leveraging survival analysis concepts within a deep learning framework exploiting textual information from claims notes. Our proposed SMARTR model addresses limitations of state-of-the-art models, such as handling data-label mismatches and non-uniform data frequency, to enhance a posteriori classification and early detection. Our results suggest that incorporating temporal dynamics and empty period representation improves model performance, highlighting the importance of considering time in insurance claim analysis. The approach appears promising for application to other insurance datasets.

2023

pdf bib
Guided Beam Search to Improve Generalization in Low-Resource Data-to-Text Generation
Nicolas Garneau | Luc Lamontagne
Proceedings of the 16th International Natural Language Generation Conference

In this paper, we introduce a new beam search algorithm that improves the generalization of neural generators to unseen examples, especially in low-resource data-to-text settings. Our algorithm aims to reduce the number of omissions and hallucinations during the decoding process. For this purpose, it relies on two regression models to explicitly characterize factual errors. We explain how to create a new dataset to train these models given an original training set of less than a thousand data points. We apply our approach in the low-resource, legal setting using the French Plum2Text dataset, as well as in English using WebNLG. We observe in our experiment that this combination improves the faithfulness of pre-trained neural text generators using both human and automatic evaluation. Moreover, our approach offers a level of interpretability by predicting the number of omissions and hallucinations present in a given generation with respect to the input data. Finally, we visualize our algorithm’s exploration of the hypothesis space at different steps during the decoding process.

2022

pdf bib
Evaluating Legal Accuracy of Neural Generators on the Generation of Criminal Court Dockets Description
Nicolas Garneau | Eve Gaumond | Luc Lamontagne | Pierre-Luc Déziel
Proceedings of the 15th International Conference on Natural Language Generation

2021

pdf bib
Trainable Ranking Models to Evaluate the Semantic Accuracy of Data-to-Text Neural Generator
Nicolas Garneau | Luc Lamontagne
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems

In this paper, we introduce a new embedding-based metric relying on trainable ranking models to evaluate the semantic accuracy of neural data-to-text generators. This metric is especially well suited to semantically and factually assess the performance of a text generator when tables can be associated with multiple references and table values contain textual utterances. We first present how one can implement and further specialize the metric by training the underlying ranking models on a legal Data-to-Text dataset. We show how it may provide a more robust evaluation than other evaluation schemes in challenging settings using a dataset comprising paraphrases between the table values and their respective references. Finally, we evaluate its generalization capabilities on a well-known dataset, WebNLG, by comparing it with human evaluation and a recently introduced metric based on natural language inference. We then illustrate how it naturally characterizes, both quantitatively and qualitatively, omissions and hallucinations.

pdf bib
Shared Task in Evaluating Accuracy: Leveraging Pre-Annotations in the Validation Process
Nicolas Garneau | Luc Lamontagne
Proceedings of the 14th International Conference on Natural Language Generation

We hereby present our submission to the Shared Task in Evaluating Accuracy at the INLG 2021 Conference. Our evaluation protocol relies on three main components; rules and text classifiers that pre-annotate the dataset, a human annotator that validates the pre-annotations, and a web interface that facilitates this validation. Our submission consists in fact of two submissions; we first analyze solely the performance of the rules and classifiers (pre-annotations), and then the human evaluation aided by the former pre-annotations using the web interface (hybrid). The code for the web interface and the classifiers is publicly available.

2020

pdf bib
A Robust Self-Learning Method for Fully Unsupervised Cross-Lingual Mappings of Word Embeddings: Making the Method Robustly Reproducible as Well
Nicolas Garneau | Mathieu Godbout | David Beauchemin | Audrey Durand | Luc Lamontagne
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we reproduce the experiments of Artetxe et al. (2018b) regarding the robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. We show that the reproduction of their method is indeed feasible with some minor assumptions. We further investigate the robustness of their model by introducing four new languages that are less similar to English than the ones proposed by the original paper. In order to assess the stability of their model, we also conduct a grid search over sensible hyperparameters. We then propose key recommendations that apply to any research project in order to deliver fully reproducible research.

pdf bib
Generating Intelligible Plumitifs Descriptions: Use Case Application with Ethical Considerations
David Beauchemin | Nicolas Garneau | Eve Gaumond | Pierre-Luc Déziel | Richard Khoury | Luc Lamontagne
Proceedings of the 13th International Conference on Natural Language Generation

Plumitifs (dockets) were initially a tool for law clerks. Nowadays, they are used as summaries presenting all the steps of a judicial case. Information concerning parties’ identity, jurisdiction in charge of administering the case, and some information relating to the nature and the course of the preceding are available through plumitifs. They are publicly accessible but barely understandable; they are written using abbreviations and referring to provisions from the Criminal Code of Canada, which makes them hard to reason about. In this paper, we propose a simple yet efficient multi-source language generation architecture that leverages both the plumitif and the Criminal Code’s content to generate intelligible plumitifs descriptions. It goes without saying that ethical considerations rise with these sensitive documents made readable and available at scale, legitimate concerns that we address in this paper. This is, to the best of our knowledge, the first application of plumitifs descriptions generation made available for French speakers along with an ethical discussion about the topic.

2018

pdf bib
Predicting and interpreting embeddings for out of vocabulary words in downstream tasks
Nicolas Garneau | Jean-Samuel Leboeuf | Luc Lamontagne
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

We propose a novel way to handle out of vocabulary (OOV) words in downstream natural language processing (NLP) tasks. We implement a network that predicts useful embeddings for OOV words based on their morphology and on the context in which they appear. Our model also incorporates an attention mechanism indicating the focus allocated to the left context words, the right context words or the word’s characters, hence making the prediction more interpretable. The model is a “drop-in” module that is jointly trained with the downstream task’s neural network, thus producing embeddings specialized for the task at hand. When the task is mostly syntactical, we observe that our model aims most of its attention on surface form characters. On the other hand, for tasks more semantical, the network allocates more attention to the surrounding words. In all our tests, the module helps the network to achieve better performances in comparison to the use of simple random embeddings.