Sondre Wold


2023

pdf bib
Text-To-KG Alignment: Comparing Current Methods on Classification Tasks
Sondre Wold | Lilja Øvrelid | Erik Velldal
Proceedings of the First Workshop on Matching From Unstructured and Structured Data (MATCHING 2023)

In contrast to large text corpora, knowledge graphs (KG) provide dense and structured representations of factual information. This makes them attractive for systems that supplement or ground the knowledge found in pre-trained language models with an external knowledge source. This has especially been the case for classification tasks, where recent work has focused on creating pipeline models that retrieve information from KGs like ConceptNet as additional context. Many of these models consist of multiple components, and although they differ in the number and nature of these parts, they all have in common that for some given text query, they attempt to identify and retrieve a relevant subgraph from the KG. Due to the noise and idiosyncrasies often found in KGs, it is not known how current methods compare to a scenario where the aligned subgraph is completely relevant to the query. In this work, we try to bridge this knowledge gap by reviewing current approaches to text-to-KG alignment and evaluating them on two datasets where manually created graphs are available, providing insights into the effectiveness of current methods. We release our code for reproducibility.

pdf bib
NorQuAD: Norwegian Question Answering Dataset
Sardana Ivanova | Fredrik Andreassen | Matias Jentoft | Sondre Wold | Lilja Øvrelid
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

In this paper we present NorQuAD: the first Norwegian question answering dataset for machine reading comprehension. The dataset consists of 4,752 manually created question-answer pairs. We here detail the data collection procedure and present statistics of the dataset. We also benchmark several multilingual and Norwegian monolingual language models on the dataset and compare them against human performance. The dataset will be made freely available.

pdf bib
BRENT: Bidirectional Retrieval Enhanced Norwegian Transformer
Lucas Charpentier | Sondre Wold | David Samuel | Egil Rønningstad
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Retrieval-based language models are increasingly employed in question-answering tasks. These models search in a corpus of documents for relevant information instead of having all factual knowledge stored in its parameters, thereby enhancing efficiency, transparency, and adaptability. We develop the first Norwegian retrieval-based model by adapting the REALM framework and evaluate it on various tasks. After training, we also separate the language model, which we call the reader, from the retriever components, and show that this can be fine-tuned on a range of downstream tasks. Results show that retrieval augmented language modeling improves the reader’s performance on extractive question-answering, suggesting that this type of training improves language models’ general ability to use context and that this does not happen at the expense of other abilities such as part-of-speech tagging, dependency parsing, named entity recognition, and lemmatization. Code, trained models, and data are made publicly available.

2022

pdf bib
The Effectiveness of Masked Language Modeling and Adapters for Factual Knowledge Injection
Sondre Wold
Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing

This paper studies the problem of injecting factual knowledge into large pre-trained language models. We train adapter modules on parts of the ConceptNet knowledge graph using the masked language modeling objective and evaluate the success of the method by a series of probing experiments on the LAMA probe. Mean P@K curves for different configurations indicate that the technique is effective, increasing the performance on sub-sets of the LAMA probe for large values of k by adding as little as 2.1% additional parameters to the original models.