2024
pdf
bib
abs
Compositional Generalization with Grounded Language Models
Sondre Wold
|
Étienne Simon
|
Lucas Charpentier
|
Egor Kostylev
|
Erik Velldal
|
Lilja Øvrelid
Findings of the Association for Computational Linguistics: ACL 2024
Grounded language models use external sources of information, such as knowledge graphs, to meet some of the general challenges associated with pre-training. By extending previous work on compositional generalization in semantic parsing, we allow for a controlled evaluation of the degree to which these models learn and generalize from patterns in knowledge graphs. We develop a procedure for generating natural language questions paired with knowledge graphs that targets different aspects of compositionality and further avoids grounding the language models in information already encoded implicitly in their weights. We evaluate existing methods for combining language models with knowledge graphs and find them to struggle with generalization to sequences of unseen lengths and to novel combinations of seen base components. While our experimental results provide some insight into the expressive power of these models, we hope our work and released datasets motivate future research on how to better combine language models with structured knowledge representations.
pdf
bib
abs
Generative Approaches to Event Extraction: Survey and Outlook
Étienne Simon
|
Helene Olsen
|
Huiling You
|
Samia Touileb
|
Lilja Øvrelid
|
Erik Velldal
Proceedings of the Workshop on the Future of Event Detection (FuturED)
pdf
bib
abs
Socio-political Events of Conflict and Unrest: A Survey of Available Datasets
Helene Olsen
|
Étienne Simon
|
Erik Velldal
|
Lilja Øvrelid
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)
There is a large and growing body of literature on datasets created to facilitate the study of socio-political events of conflict and unrest. However, the datasets, and the approaches taken to create them, vary a lot depending on the type of research they are intended to support. For example, while scholars from natural language processing (NLP) tend to focus on annotating specific spans of text indicating various components of an event, scholars from the disciplines of political science and conflict studies tend to focus on creating databases that code an abstract but structured representation of the event, less tied to a specific source text.The survey presented in this paper aims to map out the current landscape of available event datasets within the domain of social and political conflict and unrest – both from the NLP and political science communities – offering a unified view of the work done across different disciplines.
2022
pdf
bib
abs
Fine-tuning and Sampling Strategies for Multimodal Role Labeling of Entities under Class Imbalance
Syrielle Montariol
|
Étienne Simon
|
Arij Riabi
|
Djamé Seddah
Proceedings of the Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situations
We propose our solution to the multimodal semantic role labeling task from the CONSTRAINT’22 workshop. The task aims at classifying entities in memes into classes such as “hero” and “villain”. We use several pre-trained multi-modal models to jointly encode the text and image of the memes, and implement three systems to classify the role of the entities. We propose dynamic sampling strategies to tackle the issue of class imbalance. Finally, we perform qualitative analysis on the representations of the entities.
2019
pdf
bib
abs
Unsupervised Information Extraction: Regularizing Discriminative Approaches with Relation Distribution Losses
Étienne Simon
|
Vincent Guigue
|
Benjamin Piwowarski
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Unsupervised relation extraction aims at extracting relations between entities in text. Previous unsupervised approaches are either generative or discriminative. In a supervised setting, discriminative approaches, such as deep neural network classifiers, have demonstrated substantial improvement. However, these models are hard to train without supervision, and the currently proposed solutions are unstable. To overcome this limitation, we introduce a skewness loss which encourages the classifier to predict a relation with confidence given a sentence, and a distribution distance loss enforcing that all relations are predicted in average. These losses improve the performance of discriminative based models, and enable us to train deep neural networks satisfactorily, surpassing current state of the art on three different datasets.