Richard Johansson

2025

Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)
Richard Johansson | Sara Stymne
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

pdf bib abs

Language model (LM) re-rankers are used to refine retrieval results for retrieval-augmented generation (RAG). They are more expensive than lexical matching methods like BM25 but assumed to better process semantic information and the relations between the query and the retrieved answers. To understand whether LM re-rankers always live up to this assumption, we evaluate 6 different LM re-rankers on the NQ, LitQA2 and DRUID datasets. Our results show that LM re-rankers struggle to outperform a simple BM25 baseline on DRUID. Leveraging a novel separation metric based on BM25 scores, we explain and identify re-ranker errors stemming from lexical dissimilarities. We also investigate different methods to improve LM re-ranker performance and find these methods mainly useful for NQ. Taken together, our work identifies and explains weaknesses of LM re-rankers and points to the need for more adversarial and realistic datasets for their evaluation.

pdf bib abs

Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion
Denitsa Saynova | Lovisa Hagström | Moa Johansson | Richard Johansson | Marco Kuhlmann
Findings of the Association for Computational Linguistics: ACL 2025

Language models (LMs) can make a correct prediction based on many possible signals in a prompt, not all corresponding to recall of factual associations. However, current interpretations of LMs fail to take this into account. For example, given the query “Astrid Lindgren was born in” with the corresponding completion “Sweden”, no difference is made between whether the prediction was based on knowing where the author was born or assuming that a person with a Swedish-sounding name was born in Sweden. In this paper, we present a model-specific recipe - PrISM - for constructing datasets with examples of four different prediction scenarios: generic language modeling, guesswork, heuristics recall and exact fact recall. We apply two popular interpretability methods to the scenarios: causal tracing (CT) and information flow analysis. We find that both yield distinct results for each scenario. Results for exact fact recall and generic language modeling scenarios confirm previous conclusions about the importance of mid-range MLP sublayers for fact recall, while results for guesswork and heuristics indicate a critical role of late last token position MLP sublayers. In summary, we contribute resources for a more extensive and granular study of fact completion in LMs, together with analyses that provide a more nuanced understanding of how LMs process fact-related queries.

pdf bib abs

Benchmarking Debiasing Methods for LLM-based Parameter Estimates
Nicolas Audinet de Pieuchon | Adel Daoud | Connor Thomas Jerzak | Moa Johansson | Richard Johansson
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) offer an inexpensive yet powerful way to annotate text, but are often inconsistent when compared with experts. These errors can bias downstream estimates of population parameters such as regression coefficients and causal effects. To mitigate this bias, researchers have developed debiasing methods such as Design-based Supervised Learning (DSL) and Prediction-Powered Inference (PPI), which promise valid estimation by combining LLM annotations with a limited number of expensive expert annotations.Although these methods produce consistent estimates under theoretical assumptions, it is unknown how they compare in finite samples of sizes encountered in applied research. We make two contributions: First, we study how each method’s performance scales with the number of expert annotations, highlighting regimes where LLM bias or limited expert labels significantly affect results. Second, we compare DSL and PPI across a range of tasks, finding that although both achieve low bias with large datasets, DSL often outperforms PPI on bias reduction and empirical efficiency, but its performance is less consistent across datasets. Our findings indicate that there is a bias-variance tradeoff at the level of debiasing methods, calling for more research on developing metrics for quantifying their efficiency in finite samples.

2024

pdf bib abs

Deciphering the Interplay of Parametric and Non-parametric Memory in Retrieval-augmented Language Models
Mehrdad Farahani | Richard Johansson
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Generative language models often struggle with specialized or less-discussed knowledge. A potential solution is found in Retrieval-Augmented Generation (RAG) models which act like retrieving information before generating responses. In this study, we explore how the Atlas approach, a RAG model, decides between what it already knows (parametric) and what it retrieves (non-parametric). We use causal mediation analysis and controlled experiments to examine how internal representations influence information processing. Our findings disentangle the effects of parametric knowledge and the retrieved context. They indicate that in cases where the model can choose between both types of information (parametric and non-parametric), it relies more on the context than the parametric knowledge. Furthermore, the analysis investigates the computations involved in how the model uses the information from the context. We find that multiple mechanisms are active within the model and can be detected with mediation analysis: first, the decision of whether the context is relevant, and second, how the encoder computes output representations to support copying when relevant.

pdf bib abs

Can Large Language Models (or Humans) Disentangle Text?
Nicolas Audinet de Pieuchon | Adel Daoud | Connor Jerzak | Moa Johansson | Richard Johansson
Proceedings of the Sixth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS 2024)

We investigate the potential of large language models (LLMs) to disentangle text variables—to remove the textual traces of an undesired forbidden variable in a task sometimes known as text distillation and closely related to the fairness in AI and causal inference literature. We employ a range of various LLM approaches in an attempt to disentangle text by identifying and removing information about a target variable while preserving other relevant signals. We show that in the strong test of removing sentiment, the statistical association between the processed text and sentiment is still detectable to machine learning classifiers post-LLM-disentanglement. Furthermore, we find that human annotators also struggle to disentangle sentiment while preserving other semantic content. This suggests there may be limited separability between concept variables in some text contexts, highlighting limitations of methods relying on text-level transformations and also raising questions about the robustness of disentanglement methods that achieve statistical independence in representation space.

pdf bib abs

Transformer-based Swedish Semantic Role Labeling through Transfer Learning
Dana Dannélls | Richard Johansson | Lucy Yang Buhr
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Semantic Role Labeling (SRL) is a task in natural language understanding where the goal is to extract semantic roles for a given sentence. English SRL has achieved state-of-the-art performance using Transformer techniques and supervised learning. However, this technique is not a viable choice for smaller languages like Swedish due to the limited amount of training data. In this paper, we present the first effort in building a Transformer-based SRL system for Swedish by exploring multilingual and cross-lingual transfer learning methods and leveraging the Swedish FrameNet resource. We demonstrate that multilingual transfer learning outperforms two different cross-lingual transfer models. We also found some differences between frames in FrameNet that can either hinder or enhance the model’s performance. The resulting end-to-end model is freely available and will be made accessible through Språkbanken Text’s research infrastructure.

pdf bib abs

What Happens to a Dataset Transformed by a Projection-based Concept Removal Method?
Richard Johansson
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We investigate the behavior of methods using linear projections to remove information about a concept from a language representation, and we consider the question of what happens to a dataset transformed by such a method. A theoretical analysis and experiments on real-world and synthetic data show that these methods inject strong statistical dependencies into the transformed datasets. After applying such a method, the representation space is highly structured: in the transformed space, an instance tends to be located near instances of the opposite label. As a consequence, the original labeling can in some cases be reconstructed by applying an anti-clustering method.

2023

pdf bib abs

The Effect of Scaling, Retrieval Augmentation and Form on the Factual Consistency of Language Models
Lovisa Hagström | Denitsa Saynova | Tobias Norlund | Moa Johansson | Richard Johansson
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Large Language Models (LLMs) make natural interfaces to factual knowledge, but their usefulness is limited by their tendency to deliver inconsistent answers to semantically equivalent questions. For example, a model might supply the answer “Edinburgh” to “Anne Redpath passed away in X.” and “London” to “Anne Redpath’s life ended in X.” In this work, we identify potential causes of inconsistency and evaluate the effectiveness of two mitigation strategies: up-scaling and augmenting the LM with a passage retrieval database. Our results on the LLaMA and Atlas models show that both strategies reduce inconsistency but that retrieval augmentation is considerably more efficient. We further consider and disentangle the consistency contributions of different components of Atlas. For all LMs evaluated we find that syntactical form and task artifacts impact consistency. Taken together, our results provide a better understanding of the factors affecting the factual consistency of language models.

pdf bib abs

On the Generalization Ability of Retrieval-Enhanced Transformers
Tobias Norlund | Ehsan Doostmohammadi | Richard Johansson | Marco Kuhlmann
Findings of the Association for Computational Linguistics: EACL 2023

Recent work on the Retrieval-Enhanced Transformer (RETRO) model has shown impressive results: off-loading memory from trainable weights to a retrieval database can significantly improve language modeling and match the performance of non-retrieval models that are an order of magnitude larger in size. It has been suggested that at least some of this performance gain is due to non-trivial generalization based on both model weights and retrieval. In this paper, we try to better understand the relative contributions of these two components. We find that the performance gains from retrieval to a very large extent originate from overlapping tokens between the database and the test data, suggesting less of non-trivial generalization than previously assumed. More generally, our results point to the challenges of evaluating the generalization of retrieval-augmented language models such as RETRO, as even limited token overlap may significantly decrease test-time loss. We release our code and model at https://github.com/TobiasNorlund/retro

pdf bib abs

Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models
Ehsan Doostmohammadi | Tobias Norlund | Marco Kuhlmann | Richard Johansson
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Augmenting language models with a retrieval mechanism has been shown to significantly improve their performance while keeping the number of parameters low. Retrieval-augmented models commonly rely on a semantic retrieval mechanism based on the similarity between dense representations of the query chunk and potential neighbors. In this paper, we study the state-of-the-art Retro model and observe that its performance gain is better explained by surface-level similarities, such as token overlap. Inspired by this, we replace the semantic retrieval in Retro with a surface-level method based on BM25, obtaining a significant reduction in perplexity. As full BM25 retrieval can be computationally costly for large datasets, we also apply it in a re-ranking scenario, gaining part of the perplexity reduction with minimal computational overhead.

pdf bib abs

An Empirical Study of Multitask Learning to Improve Open Domain Dialogue Systems
Mehrdad Farahani | Richard Johansson
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Autoregressive models used to generate responses in open-domain dialogue systems often struggle to take long-term context into account and to maintain consistency over a dialogue. Previous research in open-domain dialogue generation has shown that the use of auxiliary tasks can introduce inductive biases that encourage the model to improve these qualities. However, most previous research has focused on encoder-only or encoder/decoder models, while the use of auxiliary tasks in encoder-only autoregressive models is under-explored. This paper describes an investigation where four different auxiliary tasks are added to small and medium-sized GPT-2 models fine-tuned on the PersonaChat and DailyDialog datasets. The results show that the introduction of the new auxiliary tasks leads to small but consistent improvement in evaluations of the investigated models.

pdf bib abs

Class Explanations: the Role of Domain-Specific Content and Stop Words
Denitsa Saynova | Bastiaan Bruinsma | Moa Johansson | Richard Johansson
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

We address two understudied areas related to explainability for neural text models. First, class explanations. What features are descriptive across a class, rather than explaining single input instances? Second, the type of features that are used for providing explanations. Does the explanation involve the statistical pattern of word usage or the presence of domain-specific content words? Here, we present a method to extract both class explanations and strategies to differentiate between two types of explanations – domain-specific signals or statistical variations in frequencies of common words. We demonstrate our method using a case study in which we analyse transcripts of political debates in the Swedish Riksdag.

2022

pdf bib abs

Controlling for Stereotypes in Multimodal Language Model Evaluation
Manuj Malik | Richard Johansson
Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

We propose a methodology and design two benchmark sets for measuring to what extent language-and-vision language models use the visual signal in the presence or absence of stereotypes. The first benchmark is designed to test for stereotypical colors of common objects, while the second benchmark considers gender stereotypes. The key idea is to compare predictions when the image conforms to the stereotype to predictions when it does not. Our results show that there is significant variation among multimodal models: the recent Transformer-based FLAVA seems to be more sensitive to the choice of image and less affected by stereotypes than older CNN-based models such as VisualBERT and LXMERT. This effect is more discernible in this type of controlled setting than in traditional evaluations where we do not know whether the model relied on the stereotype or the visual signal.

pdf bib abs

How to Adapt Pre-trained Vision-and-Language Models to a Text-only Input?
Lovisa Hagström | Richard Johansson
Proceedings of the 29th International Conference on Computational Linguistics

Current language models have been criticised for learning language from text alone without connection between words and their meaning. Consequently, multimodal training has been proposed as a way for creating models with better language understanding by providing the lacking connection. We focus on pre-trained multimodal vision-and-language (VL) models for which there already are some results on their language understanding capabilities. An unresolved issue with evaluating the linguistic skills of these models, however, is that there is no established method for adapting them to text-only input without out-of-distribution uncertainty. To find the best approach, we investigate and compare seven possible methods for adapting three different pre-trained VL models to text-only input. Our evaluations on both GLUE and Visual Property Norms (VPN) show that care should be put into adapting VL models to zero-shot text-only tasks, while the models are less sensitive to how we adapt them to non-zero-shot tasks. We also find that the adaptation methods perform differently for different models and that unimodal model counterparts perform on par with the VL models regardless of adaptation, indicating that current VL models do not necessarily gain better language understanding from their multimodal training.

pdf bib abs

Cross-modal Transfer Between Vision and Language for Protest Detection
Ria Raj | Kajsa Andreasson | Tobias Norlund | Richard Johansson | Aron Lagerberg
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

Most of today’s systems for socio-political event detection are text-based, while an increasing amount of information published on the web is multi-modal. We seek to bridge this gap by proposing a method that utilizes existing annotated unimodal data to perform event detection in another data modality, zero-shot. Specifically, we focus on protest detection in text and images, and show that a pretrained vision-and-language alignment model (CLIP) can be leveraged towards this end. In particular, our results suggest that annotated protest text data can act supplementarily for detecting protests in images, but significant transfer is demonstrated in the opposite direction as well.

pdf bib abs

Conceptualizing Treatment Leakage in Text-based Causal Inference
Adel Daoud | Connor Jerzak | Richard Johansson
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Causal inference methods that control for text-based confounders are becoming increasingly important in the social sciences and other disciplines where text is readily available. However, these methods rely on a critical assumption that there is no treatment leakage: that is, the text only contains information about the confounder and no information about treatment assignment. When this assumption does not hold, methods that control for text to adjust for confounders face the problem of post-treatment (collider) bias. However, the assumption that there is no treatment leakage may be unrealistic in real-world situations involving text, as human language is rich and flexible. Language appearing in a public policy document or health records may refer to the future and the past simultaneously, and thereby reveal information about the treatment assignment. In this article, we define the treatment-leakage problem, and discuss the identification as well as the estimation challenges it raises. Second, we delineate the conditions under which leakage can be addressed by removing the treatment-related signal from the text in a pre-processing step we define as text distillation. Lastly, using simulation, we show how treatment leakage introduces a bias in estimates of the average treatment effect (ATE) and how text distillation can mitigate this bias.

pdf bib abs

Can We Use Small Models to Investigate Multimodal Fusion Methods?
Lovisa Hagström | Tobias Norlund | Richard Johansson
Proceedings of the 2022 CLASP Conference on (Dis)embodiment

Many successful methods for fusing language with information from the visual modality have recently been proposed and the topic of multimodal training is ever evolving. However, it is still largely not known what makes different vision-and-language models successful. Investigations into this are made difficult by the large sizes of the models used, requiring large training datasets and causing long train and compute times. Therefore, we propose the idea of studying multimodal fusion methods in a smaller setting with small models and datasets. In this setting, we can experiment with different approaches for fusing multimodal information with language in a controlled fashion, while allowing for fast experimentation. We illustrate this idea with the math arithmetics sandbox. This is a setting in which we fuse language with information from the math modality and strive to replicate some fusion methods from the vision-and-language domain. We find that some results for fusion methods from the larger domain translate to the math arithmetics sandbox, indicating a promising future avenue for multimodal model prototyping.

pdf bib abs

What do Models Learn From Training on More Than Text? Measuring Visual Commonsense Knowledge
Lovisa Hagström | Richard Johansson
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

There are limitations in learning language from text alone. Therefore, recent focus has been on developing multimodal models. However, few benchmarks exist that can measure what language models learn about language from multimodal training. We hypothesize that training on a visual modality should improve on the visual commonsense knowledge in language models. Therefore, we introduce two evaluation tasks for measuring visual commonsense knowledge in language models (code publicly available at: github.com/lovhag/measure-visual-commonsense-knowledge) and use them to evaluate different multimodal models and unimodal baselines. Primarily, we find that the visual commonsense knowledge is not significantly different between the multimodal models and unimodal baseline models trained on visual text data.

2021

pdf bib abs

Knowledge Distillation for Swedish NER models: A Search for Performance and Efficiency
Lovisa Hagström | Richard Johansson
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

The current recipe for better model performance within NLP is to increase model size and training data. While it gives us models with increasingly impressive results, it also makes it more difficult to train and deploy state-of-the-art models for NLP due to increasing computational costs. Model compression is a field of research that aims to alleviate this problem. The field encompasses different methods that aim to preserve the performance of a model while decreasing the size of it. One such method is knowledge distillation. In this article, we investigate the effect of knowledge distillation for named entity recognition models in Swedish. We show that while some sequence tagging models benefit from knowledge distillation, not all models do. This prompts us to ask questions about in which situations and for which models knowledge distillation is beneficial. We also reason about the effect of knowledge distillation on computational costs.

pdf bib abs

Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?
Tobias Norlund | Lovisa Hagström | Richard Johansson
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Large language models are known to suffer from the hallucination problem in that they are prone to output statements that are false or inconsistent, indicating a lack of knowledge. A proposed solution to this is to provide the model with additional data modalities that complements the knowledge obtained through text. We investigate the use of visual data to complement the knowledge of large language models by proposing a method for evaluating visual knowledge transfer to text for uni- or multimodal language models. The method is based on two steps, 1) a novel task querying for knowledge of memory colors, i.e. typical colors of well-known objects, and 2) filtering of model training data to clearly separate knowledge contributions. Additionally, we introduce a model architecture that involves a visual imagination step and evaluate it with our proposed method. We find that our method can successfully be used to measure visual knowledge transfer capabilities in models and that our novel model architecture shows promising results for leveraging multimodal knowledge in a unimodal setting.

2020

pdf bib abs

An Arabic Tweets Sentiment Analysis Dataset (ATSAD) using Distant Supervision and Self Training
Kathrein Abu Kwaik | Stergios Chatzikyriakidis | Simon Dobnik | Motaz Saad | Richard Johansson
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection

As the number of social media users increases, they express their thoughts, needs, socialise and publish their opinions reviews. For good social media sentiment analysis, good quality resources are needed, and the lack of these resources is particularly evident for languages other than English, in particular Arabic. The available Arabic resources lack of from either the size of the corpus or the quality of the annotation. In this paper, we present an Arabic Sentiment Analysis Corpus collected from Twitter, which contains 36K tweets labelled into positive and negative. We employed distant supervision and self-training approaches into the corpus to annotate it. Besides, we release an 8K tweets manually annotated as a gold standard. We evaluated the corpus intrinsically by comparing it to human classification and pre-trained sentiment analysis models, Moreover, we apply extrinsic evaluation methods exploiting sentiment analysis task and achieve an accuracy of 86%.

pdf bib abs

Training a Swedish Constituency Parser on Six Incompatible Treebanks
Richard Johansson | Yvonne Adesam
Proceedings of the Twelfth Language Resources and Evaluation Conference

We investigate a transition-based parser that uses Eukalyptus, a function-tagged constituent treebank for Swedish which includes discontinuous constituents. In addition, we show that the accuracy of this parser can be improved by using a multitask learning architecture that makes it possible to train the parser on additional treebanks that use other annotation models.

2019

pdf bib abs

Natural Language Processing in Policy Evaluation: Extracting Policy Conditions from IMF Loan Agreements
Joakim Åkerström | Adel Daoud | Richard Johansson
Proceedings of the 22nd Nordic Conference on Computational Linguistics

Social science researchers often use text as the raw data in investigations: for instance, when investigating the effects of IMF policies on the development of countries under IMF programs, researchers typically encode structured descriptions of the programs using a time-consuming manual effort. Making this process automatic may open up new opportunities in scaling up such investigations. As a first step towards automatizing this coding process, we describe an experiment where we apply a sentence classifier that automatically detects mentions of policy conditions in IMF loan agreements and divides them into different types. The results show that the classifier is generally able to detect the policy conditions, although some types are hard to distinguish.

2018

pdf bib abs

The 2018 Shared Task on Extrinsic Parser Evaluation: On the Downstream Utility of English Universal Dependency Parsers
Murhaf Fares | Stephan Oepen | Lilja Øvrelid | Jari Björne | Richard Johansson
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

We summarize empirical results and tentative conclusions from the Second Extrinsic Parser Evaluation Initiative (EPE 2018). We review the basic task setup, downstream applications involved, and end-to-end results for seventeen participating teams. Based on in-depth quantitative and qualitative analysis, we correlate intrinsic evaluation results at different layers of morph-syntactic analysis with observed downstream behavior.

pdf bib abs

Automatically Linking Lexical Resources with Word Sense Embedding Models
Luis Nieto-Piña | Richard Johansson
Proceedings of the Third Workshop on Semantic Deep Learning

Automatically learnt word sense embeddings are developed as an attempt to refine the capabilities of coarse word embeddings. The word sense representations obtained this way are, however, sensitive to underlying corpora and parameterizations, and they might be difficult to relate to formally defined word senses. We propose to tackle this problem by devising a mechanism to establish links between word sense embeddings and lexical resources created by experts. We evaluate the applicability of these links in a task to retrieve instances of word sense unlisted in the lexicon.

2017

pdf bib abs

Training Word Sense Embeddings With Lexicon-based Regularization
Luis Nieto-Piña | Richard Johansson
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We propose to improve word sense embeddings by enriching an automatic corpus-based method with lexicographic data. Information from a lexicon is introduced into the learning algorithm’s objective function through a regularizer. The incorporation of lexicographic data yields embeddings that are able to reflect expert-defined word senses, while retaining the robustness, high quality, and coverage of automatic corpus-based methods. These properties are observed in a manual inspection of the semantic clusters that different degrees of regularizer strength create in the vector space. Moreover, we evaluate the sense embeddings in two downstream applications: word sense disambiguation and semantic frame prediction, where they outperform simpler approaches. Our results show that a corpus-based model balanced with lexicographic data learns better representations and improve their performance in downstream tasks.

pdf bib abs

Character-based recurrent neural networks for morphological relational reasoning
Olof Mogren | Richard Johansson
Proceedings of the First Workshop on Subword and Character Level Models in NLP

We present a model for predicting word forms based on morphological relational reasoning with analogies. While previous work has explored tasks such as morphological inflection and reinflection, these models rely on an explicit enumeration of morphological features, which may not be available in all cases. To address the task of predicting a word form given a demo relation (a pair of word forms) and a query word, we devise a character-based recurrent neural network architecture using three separate encoders and a decoder. We also investigate a multiclass learning setup, where the prediction of the relation type label is used as an auxiliary task. Our results show that the exact form can be predicted for English with an accuracy of 94.7%. For Swedish, which has a more complex morphology with more inflectional patterns for nouns and verbs, the accuracy is 89.3%. We also show that using the auxiliary task of learning the relation type speeds up convergence and improves the prediction accuracy for the word generation task.

2016

pdf bib abs

Retrieving Occurrences of Grammatical Constructions
Anna Ehrlemark | Richard Johansson | Benjamin Lyngfelt
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Finding authentic examples of grammatical constructions is central in constructionist approaches to linguistics, language processing, and second language learning. In this paper, we address this problem as an information retrieval (IR) task. To facilitate research in this area, we built a benchmark collection by annotating the occurrences of six constructions in a Swedish corpus. Furthermore, we implemented a simple and flexible retrieval system for finding construction occurrences, in which the user specifies a ranking function using lexical-semantic similarities (lexicon-based or distributional). The system was evaluated using standard IR metrics on the new benchmark, and we saw that lexical-semantical rerankers improve significantly over a purely surface-oriented system, but must be carefully tailored for each individual construction.

pdf bib abs

Gulf Arabic Linguistic Resource Building for Sentiment Analysis
Wafia Adouane | Richard Johansson
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper deals with building linguistic resources for Gulf Arabic, one of the Arabic variations, for sentiment analysis task using machine learning. To our knowledge, no previous works were done for Gulf Arabic sentiment analysis despite the fact that it is present in different online platforms. Hence, the first challenge is the absence of annotated data and sentiment lexicons. To fill this gap, we created these two main linguistic resources. Then we conducted different experiments: use Naive Bayes classifier without any lexicon; add a sentiment lexicon designed basically for MSA; use only the compiled Gulf Arabic sentiment lexicon and finally use both MSA and Gulf Arabic sentiment lexicons. The Gulf Arabic lexicon gives a good improvement of the classifier accuracy (90.54 %) over a baseline that does not use the lexicon (82.81%), while the MSA lexicon causes the accuracy to drop to (76.83%). Moreover, mixing MSA and Gulf Arabic lexicons causes the accuracy to drop to (84.94%) compared to using only Gulf Arabic lexicon. This indicates that it is useless to use MSA resources to deal with Gulf Arabic due to the considerable differences and conflicting structures between these two languages.

pdf bib abs

A Multi-domain Corpus of Swedish Word Sense Annotation
Richard Johansson | Yvonne Adesam | Gerlof Bouma | Karin Hedberg
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We describe the word sense annotation layer in Eukalyptus, a freely available five-domain corpus of contemporary Swedish with several annotation layers. The annotation uses the SALDO lexicon to define the sense inventory, and allows word sense annotation of compound segments and multiword units. We give an overview of the new annotation tool developed for this project, and finally present an analysis of the inter-annotator agreement between two annotators.

pdf bib abs

Romanized Berber and Romanized Arabic Automatic Language Identification Using Machine Learning
Wafia Adouane | Nasredine Semmar | Richard Johansson
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

The identification of the language of text/speech input is the first step to be able to properly do any language-dependent natural language processing. The task is called Automatic Language Identification (ALI). Being a well-studied field since early 1960’s, various methods have been applied to many standard languages. The ALI standard methods require datasets for training and use character/word-based n-gram models. However, social media and new technologies have contributed to the rise of informal and minority languages on the Web. The state-of-the-art automatic language identifiers fail to properly identify many of them. Romanized Arabic (RA) and Romanized Berber (RB) are cases of these informal languages which are under-resourced. The goal of this paper is twofold: detect RA and RB, at a document level, as separate languages and distinguish between them as they coexist in North Africa. We consider the task as a classification problem and use supervised machine learning to solve it. For both languages, character-based 5-grams combined with additional lexicons score the best, F-score of 99.75% and 97.77% for RB and RA respectively.

pdf bib

Embedding Senses for Efficient Graph-based Word Sense Disambiguation
Luis Nieto Piña | Richard Johansson
Proceedings of TextGraphs-10: the Workshop on Graph-based Methods for Natural Language Processing

pdf bib abs

ASIREM Participation at the Discriminating Similar Languages Shared Task 2016
Wafia Adouane | Nasredine Semmar | Richard Johansson
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

This paper presents the system built by ASIREM team for the Discriminating between Similar Languages (DSL) Shared task 2016. It describes the system which uses character-based and word-based n-grams separately. ASIREM participated in both sub-tasks (sub-task 1 and sub-task 2) and in both open and closed tracks. For the sub-task 1 which deals with Discriminating between similar languages and national language varieties, the system achieved an accuracy of 87.79% on the closed track, ending up ninth (the best results being 89.38%). In sub-task 2, which deals with Arabic dialect identification, the system achieved its best performance using character-based n-grams (49.67% accuracy), ranking fourth in the closed track (the best result being 51.16%), and an accuracy of 53.18%, ranking first in the open track.

pdf bib abs

Automatic Detection of Arabicized Berber and Arabic Varieties
Wafia Adouane | Nasredine Semmar | Richard Johansson | Victoria Bobicev
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

Automatic Language Identification (ALI) is the detection of the natural language of an input text by a machine. It is the first necessary step to do any language-dependent natural language processing task. Various methods have been successfully applied to a wide range of languages, and the state-of-the-art automatic language identifiers are mainly based on character n-gram models trained on huge corpora. However, there are many languages which are not yet automatically processed, for instance minority and informal languages. Many of these languages are only spoken and do not exist in a written format. Social media platforms and new technologies have facilitated the emergence of written format for these spoken languages based on pronunciation. The latter are not well represented on the Web, commonly referred to as under-resourced languages, and the current available ALI tools fail to properly recognize them. In this paper, we revisit the problem of ALI with the focus on Arabicized Berber and dialectal Arabic short texts. We introduce new resources and evaluate the existing methods. The results show that machine learning models combined with lexicons are well suited for detecting Arabicized Berber and different Arabic varieties and distinguishing between them, giving a macro-average F-score of 92.94%.

We describe two constraint-based methods that can be used to improve the recall of a shallow discourse parser based on conditional random field chunking. These method uses a set of natural structural constraints as well as others that follow from the annotation guidelines of the Penn Discourse Treebank. We evaluated the resulting systems on the standard test set of the PDTB and achieved a rebalancing of precision and recall with improved F-measures across the board. This was especially notable when we used evaluation metrics taking partial matches into account; for these measures, we achieved F-measure improvements of several points.

pdf bib

Non-atomic Classification to Improve a Semantic Role Labeler for a Low-resource Language
Richard Johansson
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib

Global Features for Shallow Discourse Parsing
Sucheta Ghosh | Giuseppe Riccardi | Richard Johansson
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib

Search Result Diversification Methods to Assist Lexicographers
Lars Borin | Markus Forsberg | Karin Friberg Heppin | Richard Johansson | Annika Kjellandsson
Proceedings of the Sixth Linguistic Annotation Workshop

pdf bib

Modeling Topic Dependencies in Hierarchical Text Categorization
Alessandro Moschitti | Qi Ju | Richard Johansson
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib abs

Semantic Role Labeling with the Swedish FrameNet
Richard Johansson | Karin Friberg Heppin | Dimitrios Kokkinakis
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present the first results on semantic role labeling using the Swedish FrameNet, which is a lexical resource currently in development. Several aspects of the task are investigated, including the %design and selection of machine learning features, the effect of choice of syntactic parser, and the ability of the system to generalize to new frames and new genres. In addition, we evaluate two methods to make the role label classifier more robust: cross-frame generalization and cluster-based features. Although the small amount of training data limits the performance achievable at the moment, we reach promising results. In particular, the classifier that extracts the boundaries of arguments works well for new frames, which suggests that it already at this stage can be useful in a semi-automatic setting.

pdf bib

2011

pdf bib

Extracting Opinion Expressions and Their Polarities – Exploration of Pipelines and Joint Models
Richard Johansson | Alessandro Moschitti
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib

Shallow Discourse Parsing with Conditional Random Fields
Sucheta Ghosh | Richard Johansson | Giuseppe Riccardi | Sara Tonelli
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib

Syntactic and Semantic Structure for Opinion Expression Detection
Richard Johansson | Alessandro Moschitti
Proceedings of the Fourteenth Conference on Computational Natural Language Learning

pdf bib abs

A Flexible Representation of Heterogeneous Annotation Data
Richard Johansson | Alessandro Moschitti
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes a new flexible representation for the annotation of complex structures of metadata over heterogeneous data collections containing text and other types of media such as images or audio files. We argue that existing frameworks are not suitable for this purpose, most importantly because they do not easily generalize to multi-document and multimodal corpora, and because they often require the use of particular software frameworks. In the paper, we define a data model to represent such structured data over multimodal collections. Furthermore, we define a surface realization of the data structure as a simple and readable XML format. We present two examples of annotation tasks to illustrate how the representation and format work for complex structures involving multimodal annotation and cross-document links. The representation described here has been used in a large-scale project focusing on the annotation of a wide range of information ― from low-level features to high-level semantics ― in a multimodal data collection containing both text and images.

pdf bib

Reranking Models in Fine-grained Opinion Analysis
Richard Johansson | Alessandro Moschitti
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

pdf bib

pdf bib

Statistical Bistratal Dependency Parsing
Richard Johansson
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib

Text Categorization Using Predicate-Argument Structures
Jacob Persson | Richard Johansson | Pierre Nugues
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)

2008

pdf bib

Dependency-based Syntactic–Semantic Analysis with PropBank and NomBank
Richard Johansson | Pierre Nugues
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf bib

Dependency-based Semantic Role Labeling of PropBank
Richard Johansson | Pierre Nugues
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib

The Effect of Syntactic Representation on Semantic Role Labeling
Richard Johansson | Pierre Nugues
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib

The CoNLL 2008 Shared Task on Joint Parsing of Syntactic and Semantic Dependencies
Mihai Surdeanu | Richard Johansson | Adam Meyers | Lluís Màrquez | Joakim Nivre
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf bib abs

Comparing Dependency and Constituent Syntax for Frame-semantic Analysis
Richard Johansson | Pierre Nugues
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We address the question of which syntactic representation is best suited for role-semantic analysis of English in the FrameNet paradigm. We compare systems based on dependencies and constituents, and a dependency syntax with a rich set of grammatical functions with one with a smaller set. Our experiments show that dependency-based and constituent-based analyzers give roughly equivalent performance, and that a richer set of functions has a positive influence on argument classification for verbs.

2007

pdf bib

Logistic Online Learning Methods and Their Application to Incremental Dependency Parsing
Richard Johansson
Proceedings of the ACL 2007 Student Research Workshop

pdf bib

LTH: Semantic Structure Extraction using Nonprojective Dependency Trees
Richard Johansson | Pierre Nugues
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib

Incremental Dependency Parsing Using Online Learning
Richard Johansson | Pierre Nugues
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib

Extended Constituent-to-Dependency Conversion for English
Richard Johansson | Pierre Nugues
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)

2006

pdf bib

Investigating Multilingual Dependency Parsing
Richard Johansson | Pierre Nugues
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)

pdf bib

Automatic Annotation for All Semantic Layers in FrameNet
Richard Johansson | Pierre Nugues
Demonstrations

pdf bib

A FrameNet-Based Semantic Role Labeler for Swedish
Richard Johansson | Pierre Nugues
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib abs

Construction of a FrameNet Labeler for Swedish Text
Richard Johansson | Pierre Nugues
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We describe the implementation of a FrameNet-based semantic role labeling system for Swedish text. To train the system, we used a semantically annotated corpus that was produced by projection across parallel corpora. As part of the system, we developed two frame element bracketing algorithms that are suitable when no robust constituent parsers are available. Apart from being the first such system for Swedish, this is, as far as we are aware, the first semantic role labeling system for a language for which no role-semantic annotated corpora are available. The estimated accuracy of classification of pre-segmented frame elements is 0.75, and the precision and recall measures for the complete task are 0.67 and 0.47, respectively.

pdf bib

A Machine Learning Approach to Extract Temporal Information from Texts in Swedish and Generate Animated 3D Scenes
Anders Berglund | Richard Johansson | Pierre Nugues
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib abs

Extraction of Temporal Information from Texts in Swedish
Anders Berglund | Richard Johansson | Pierre Nugues
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes the implementation and evaluation of a generic component to extract temporal information from texts in Swedish. It proceeds in two steps. The first step extracts time expressions and events, and generates a feature vector for each element it identifies. Using the vectors, the second step determines the temporal relations, possibly none, between the extracted events and orders them in time. We used a machine learning approach to find the relations between events. To run the learning algorithm, we collected a corpus of road accident reports from newspapers websites that we manually annotated. It enabled us to train decision trees and to evaluate the performance of the algorithm.