Lucas Georges Gabriel Charpentier

Also published as: Lucas Charpentier


2024

pdf bib
Compositional Generalization with Grounded Language Models
Sondre Wold | Étienne Simon | Lucas Charpentier | Egor Kostylev | Erik Velldal | Lilja Øvrelid
Findings of the Association for Computational Linguistics: ACL 2024

Grounded language models use external sources of information, such as knowledge graphs, to meet some of the general challenges associated with pre-training. By extending previous work on compositional generalization in semantic parsing, we allow for a controlled evaluation of the degree to which these models learn and generalize from patterns in knowledge graphs. We develop a procedure for generating natural language questions paired with knowledge graphs that targets different aspects of compositionality and further avoids grounding the language models in information already encoded implicitly in their weights. We evaluate existing methods for combining language models with knowledge graphs and find them to struggle with generalization to sequences of unseen lengths and to novel combinations of seen base components. While our experimental results provide some insight into the expressive power of these models, we hope our work and released datasets motivate future research on how to better combine language models with structured knowledge representations.

pdf bib
More room for language: Investigating the effect of retrieval on language models
David Samuel | Lucas Charpentier | Sondre Wold
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

Retrieval-augmented language models pose a promising alternative to standard language modeling. During pretraining, these models search in a corpus of documents for contextually relevant information that could aid the language modeling objective. We introduce an ‘ideal retrieval’ methodology to study these models in a fully controllable setting. We conduct an extensive evaluation to examine how retrieval augmentation affects the behavior of the underlying language model. Among other things, we observe that these models: (i) save substantially less world knowledge in their weights, (ii) are better at understanding local context and inter-word dependencies, but (iii) are worse at comprehending global context.

2023

pdf bib
Not all layers are equally as important: Every Layer Counts BERT
Lucas Georges Gabriel Charpentier | David Samuel
Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning

pdf bib
BRENT: Bidirectional Retrieval Enhanced Norwegian Transformer
Lucas Charpentier | Sondre Wold | David Samuel | Egil Rønningstad
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Retrieval-based language models are increasingly employed in question-answering tasks. These models search in a corpus of documents for relevant information instead of having all factual knowledge stored in its parameters, thereby enhancing efficiency, transparency, and adaptability. We develop the first Norwegian retrieval-based model by adapting the REALM framework and evaluate it on various tasks. After training, we also separate the language model, which we call the reader, from the retriever components, and show that this can be fine-tuned on a range of downstream tasks. Results show that retrieval augmented language modeling improves the reader’s performance on extractive question-answering, suggesting that this type of training improves language models’ general ability to use context and that this does not happen at the expense of other abilities such as part-of-speech tagging, dependency parsing, named entity recognition, and lemmatization. Code, trained models, and data are made publicly available.