Loic De Langhe

Also published as: Loic de Langhe


2024

pdf bib
Unsupervised Authorship Attribution for Medieval Latin Using Transformer-Based Embeddings
Loic De Langhe | Orphee De Clercq | Veronique Hoste
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024

We explore the potential of employing transformer-based embeddings in an unsupervised authorship attribution task for medieval Latin. The development of Large Language Models (LLMs) and recent advances in transfer learning alleviate many of the traditional issues associated with authorship attribution in lower-resourced (ancient) languages. Despite this, these methods remain heavily understudied within this domain. Concretely, we generate strong contextual embeddings using a variety of mono -and multilingual transformer models and use these as input for two unsupervised clustering methods: a standard agglomerative clustering algorithm and a self-organizing map. We show that these transformer-based embeddings can be used to generate high-quality and interpretable clusterings, resulting in an attractive alternative to the traditional feature-based methods.

pdf bib
Enhancing Unrestricted Cross-Document Event Coreference with Graph Reconstruction Networks
Loic de Langhe | Orphee de Clercq | Veronique Hoste
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Event Coreference Resolution remains a challenging discourse-oriented task within the domain of Natural Language Processing. In this paper we propose a methodology where we combine traditional mention-pair coreference models with a lightweight and modular graph reconstruction algorithm. We show that building graph models on top of existing mention-pair models leads to improved performance for both a wide range of baseline mention-pair algorithms as well as a recently developed state-of-the-art model and this at virtually no added computational cost. Moreover, additional experiments seem to indicate that our method is highly robust in low-data settings and that its performance scales with increases in performance for the underlying mention-pair models.

2023

pdf bib
What Does BERT actually Learn about Event Coreference? Probing Structural Information in a Fine-Tuned Dutch Language Model
Loic De Langhe | Orphee De Clercq | Veronique Hoste
Proceedings of the Fourth Workshop on Insights from Negative Results in NLP

We probe structural and discourse aspects of coreferential relationships in a fine-tuned Dutch BERT event coreference model. Previous research has suggested that no such knowledge is encoded in BERT-based models and the classification of coreferential relationships ultimately rests on outward lexical similarity. While we show that BERT can encode a (very) limited number of these discourse aspects (thus disproving assumptions in earlier research), we also note that knowledge of many structural features of coreferential relationships is absent from the encodings generated by the fine-tuned BERT model.

pdf bib
Filling in the Gaps: Efficient Event Coreference Resolution using Graph Autoencoder Networks
Loic De Langhe | Orphee De Clercq | Veronique Hoste
Proceedings of The Sixth Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC 2023)

pdf bib
Leveraging Structural Discourse Information for Event Coreference Resolution in Dutch
Loic De Langhe | Orphee De Clercq | Veronique Hoste
Proceedings of the 4th Workshop on Computational Approaches to Discourse (CODI 2023)

We directly embed easily extractable discourse structure information (subsection, paragraph and text type) in a transformer-based Dutch event coreference resolution model in order to more explicitly provide it with structural information that is known to be important in coreferential relationships. Results show that integrating this type of knowledge leads to a significant improvement in CONLL F1 for within-document settings (+ 8.6\%) and a minor improvement for cross-document settings (+ 1.1\%).

2022

pdf bib
Investigating Cross-Document Event Coreference for Dutch
Loic De Langhe | Orphee De Clercq | Veronique Hoste
Proceedings of the Fifth Workshop on Computational Models of Reference, Anaphora and Coreference

In this paper we present baseline results for Event Coreference Resolution (ECR) in Dutch using gold-standard (i.e non-predicted) event mentions. A newly developed benchmark dataset allows us to properly investigate the possibility of creating ECR systems for both within and cross-document coreference. We give an overview of the state of the art for ECR in other languages, as well as a detailed overview of existing ECR resources. Afterwards, we provide a comparative report on our own dataset. We apply a significant number of approaches that have been shown to attain good results for English ECR including feature-based models, monolingual transformer language models and multilingual language models. The best results were obtained using the monolingual BERTje model. Finally, results for all models are thoroughly analysed and visualised, as to provide insight into the inner workings of ECR and long-distance semantic NLP tasks in general.