Anders Johannsen - ACL Anthology

Anders Johannsen

Also published as: Anders Johanssen

2025

ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution
Alexandru Coca | Mark Gaynor | Zhenxing Zhang | Jianpeng Cheng | Bo-Hsiang Tseng | Peter Boothroyd | Hector Martinez Alonso | Diarmuid O Seaghdha | Anders Johannsen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This work evaluates the potential of large language models (LLMs) to power digital assistants capable of complex action execution. Such assistants rely on pre-trained programming knowledge to execute multi-step goals by composing objects and functions defined in assistant libraries into action execution programs. To achieve this, we develop ASPERA, a framework comprising an assistant library simulation and a human-assisted LLM data generation engine. Our engine allows developers to guide LLM generation of high-quality tasks consisting of complex user queries, simulation state and corresponding validation programs, tackling data availability and evaluation robustness challenges. Alongside the framework we release Asper-Bench, an evaluation dataset of 250 challenging tasks generated using ASPERA, which we use to show that program generation grounded in custom assistant libraries is a significant challenge to LLMs compared to dependency-free code generation.

PyTOD: Programmable Task-Oriented Dialogue with Execution Feedback
Alexandru Coca | Bo-Hsiang Tseng | Peter Boothroyd | Jianpeng Cheng | Zhenxing Zhang | Mark Gaynor | Joe Stacey | Tristan Guigue | Héctor Martínez Alonso | Diarmuid Ó Séaghdha | Anders Johannsen
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Programmable task-oriented dialogue (TOD) agents enable language models to follow structured dialogue policies, but their effectiveness hinges on accurate dialogue state tracking (DST). We present PyTOD, an agent that generates executable code to track dialogue state and uses policy and execution feedback for efficient error correction. To achieve this, PyTOD employs a simple constrained decoding approach, using a language model instead of grammar rules to follow API schemata. This leads to state-of-the-art DST performance on the challenging SGD benchmark. Our experiments show that PyTOD surpasses strong baselines in both accuracy and cross-turn consistency, demonstrating the effectiveness of execution-aware state tracking.

2024

Hierarchical and Dynamic Prompt Compression for Efficient Zero-shot API Usage
Yichen Jiang | Marco Vecchio | Mohit Bansal | Anders Johannsen
Findings of the Association for Computational Linguistics: EACL 2024

Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries
Seanie Lee | Jianpeng Cheng | Joris Driesen | Alexandru Coca | Anders Johannsen
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Few-shot dialogue state tracking (DST) with Large Language Models (LLM) relies on an effective and efficient conversation retriever to find similar in-context examples for prompt learning. Previous works use raw dialogue context as search keys and queries, and a retriever is fine-tuned with annotated dialogues to achieve superior performance. However, the approach is less suited for scaling to new domains or new annotation languages, where fine-tuning data is unavailable. To address this problem, we handle the task of conversation retrieval based on text summaries of the conversations.A LLM-based conversation summarizer is adopted for query and key generation, which enables effective maximum inner product search. To avoid the extra inference cost brought by LLM-based conversation summarization, we further distill a light-weight conversation encoder which produces query embeddings without decoding summaries for test conversations. We validate our retrieval approach on MultiWOZ datasets with GPT-Neo-2.7B and LLaMA-7B/30B. The experimental results show a significant improvement over relevant baselines in real few-shot DST settings.

LUCID: LLM-Generated Utterances for Complex and Interesting Dialogues
Joe Stacey | Jianpeng Cheng | John Torr | Tristan Guigue | Joris Driesen | Alexandru Coca | Mark Gaynor | Anders Johannsen
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)

Spurred by recent advances in Large Language Models (LLMs), virtual assistants are poised to take a leap forward in terms of their dialogue capabilities. Yet a major bottleneck to achieving genuinely transformative task-oriented dialogue capabilities remains the scarcity of high quality data. Existing datasets, while impressive in scale, have limited domain coverage and contain few genuinely challenging conversational phenomena; those which are present are typically unlabelled, making it difficult to assess the strengths and weaknesses of models without time-consuming and costly human evaluation. Moreover, creating high quality dialogue data has until now required considerable human input, limiting both the scale of these datasets and the ability to rapidly bootstrap data for a new target domain. We aim to overcome these issues with LUCID, a modularised and highly automated LLM-driven data generation system that produces realistic, diverse and challenging dialogues. We use LUCID to generate a seed dataset of 4,277 conversations across 100 intents to demonstrate its capabilities, with a human review finding consistently high quality labels in the generated data.

2020

We consider a new perspective on dialog state tracking (DST), the task of estimating a user’s goal through the course of a dialog. By formulating DST as a semantic parsing task over hierarchical representations, we can incorporate semantic compositionality, cross-domain knowledge sharing and co-reference. We present TreeDST, a dataset of 27k conversations annotated with tree-structured dialog states and system acts. We describe an encoder-decoder framework for DST with hierarchical representations, which leads to ~20% improvement over state-of-the-art DST approaches that operate on a flat meaning space of slot-value pairs.

2016

An empirically grounded expansion of the supersense inventory
Hector Martinez Alonso | Anders Johannsen | Sanni Nimb | Sussi Olsen | Bolette Pedersen
Proceedings of the 8th Global WordNet Conference (GWC)

In this article we present an expansion of the supersense inventory. All new super-senses are extensions of members of the current inventory, which we postulate by identifying semantically coherent groups of synsets. We cover the expansion of the already-established supernsense inventory for nouns and verbs, the addition of coarse supersenses for adjectives in absence of a canonical supersense inventory, and super-senses for verbal satellites. We evaluate the viability of the new senses examining the annotation agreement, frequency and co-ocurrence patterns.

The SemDaX Corpus ― Sense Annotations with Scalable Sense Inventories
Bolette Pedersen | Anna Braasch | Anders Johannsen | Héctor Martínez Alonso | Sanni Nimb | Sussi Olsen | Anders Søgaard | Nicolai Hartvig Sørensen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We launch the SemDaX corpus which is a recently completed Danish human-annotated corpus available through a CLARIN academic license. The corpus includes approx. 90,000 words, comprises six textual domains, and is annotated with sense inventories of different granularity. The aim of the developed corpus is twofold: i) to assess the reliability of the different sense annotation schemes for Danish measured by qualitative analyses and annotation agreement scores, and ii) to serve as training and test data for machine learning algorithms with the practical purpose of developing sense taggers for Danish. To these aims, we take a new approach to human-annotated corpus resources by double annotating a much larger part of the corpus than what is normally seen: for the all-words task we double annotated 60% of the material and for the lexical sample task 100%. We include in the corpus not only the adjucated files, but also the diverging annotations. In other words, we consider not all disagreement to be noise, but rather to contain valuable linguistic information that can help us improve our annotation schemes and our learning algorithms.

Exploring Language Variation Across Europe - A Web-based Tool for Computational Sociolinguistics
Dirk Hovy | Anders Johannsen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Language varies not only between countries, but also along regional and socio-demographic lines. This variation is one of the driving factors behind language change. However, investigating language variation is a complex undertaking: the more factors we want to consider, the more data we need. Traditional qualitative methods are not well-suited to do this, an therefore restricted to isolated factors. This reduction limits the potential insights, and risks attributing undue importance to easily observed factors. While there is a large interest in linguistics to increase the quantitative aspect of such studies, it requires training in both variational linguistics and computational methods, a combination that is still not common. We take a first step here to alleviating the problem by providing an interface, www.languagevariation.com, to explore large-scale language variation along multiple socio-demographic factors – without programming knowledge. It makes use of large amounts of data and provides statistical analyses, maps, and interactive features that will enable scholars to explore language variation in a data-driven way.

Joint part-of-speech and dependency projection from multiple sources
Anders Johannsen | Željko Agić | Anders Søgaard
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Multilingual Projection for Parsing Truly Low-Resource Languages
Željko Agić | Anders Johannsen | Barbara Plank | Héctor Martínez Alonso | Natalie Schluter | Anders Søgaard
Transactions of the Association for Computational Linguistics, Volume 4

We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. Our annotation projection-based approach yields tagging and parsing models for over 100 languages. All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. The empirical evaluation across 30 test languages shows that our method consistently provides top-level accuracies, close to established upper bounds, and outperforms several competitive baselines.

SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM)
Nathan Schneider | Dirk Hovy | Anders Johannsen | Marine Carpuat
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

Supersense tagging with inter-annotator disagreement
Héctor Martínez Alonso | Anders Johannsen | Barbara Plank
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

2015

Any-language frame-semantic parsing
Anders Johannsen | Héctor Martínez Alonso | Anders Søgaard
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Cross-lingual syntactic variation over age and gender
Anders Johannsen | Dirk Hovy | Anders Søgaard
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

Inverted indexing for cross-lingual NLP
Anders Søgaard | Željko Agić | Héctor Martínez Alonso | Barbara Plank | Bernd Bohnet | Anders Johannsen
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Supersense tagging for Danish
Héctor Martínez Alonso | Anders Johannsen | Sussi Olsen | Sanni Nimb | Nicolai Hartvig Sørensen | Anna Braasch | Anders Søgaard | Bolette Sandford Pedersen
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

Active learning for sense annotation
Héctor Martínez Alonso | Barbara Plank | Anders Johannsen | Anders Søgaard
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

Coarse-grained sense annotation of Danish across textual domains
Sussi Olsen | Bolette S. Pedersen | Héctor Martínez Alonso | Anders Johannsen
Proceedings of the workshop on Semantic resources and semantic annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015

Predicting word sense annotation agreement
Héctor Martínez Alonso | Anders Johannsen | Oier Lopez de Lacalle | Eneko Agirre
Proceedings of the First Workshop on Linking Computational Models of Lexical, Sentential and Discourse-level Semantics

2014

Importance weighting and unsupervised domain adaptation of POS taggers: a negative result
Barbara Plank | Anders Johannsen | Anders Søgaard
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

More or less supervised supersense tagging of Twitter
Anders Johannsen | Dirk Hovy | Héctor Martínez Alonso | Barbara Plank | Anders Søgaard
Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)

Copenhagen-Malmö: Tree Approximations of Semantic Parsing Problems
Natalie Schluter | Anders Søgaard | Jakob Elming | Dirk Hovy | Barbara Plank | Héctor Martínez Alonso | Anders Johanssen | Sigrid Klerke
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

What’s in a p-value in NLP?
Anders Søgaard | Anders Johannsen | Barbara Plank | Dirk Hovy | Hector Martínez Alonso
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

2013

Using Crowdsourcing to get Representations based on Regular Expressions
Anders Søgaard | Hector Martinez | Jakob Elming | Anders Johannsen
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

Cross-Domain Answer Ranking using Importance Sampling
Anders Johannsen | Anders Søgaard
Proceedings of the Sixth International Joint Conference on Natural Language Processing

Disambiguating Explicit Discourse Connectives without Oracles
Anders Johannsen | Anders Søgaard
Proceedings of the Sixth International Joint Conference on Natural Language Processing

Down-stream effects of tree-to-dependency conversions
Jakob Elming | Anders Johannsen | Sigrid Klerke | Emanuele Lapponi | Hector Martinez Alonso | Anders Søgaard
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

Robust Learning in Random Subspaces: Equipping NLP for OOV Effects
Anders Søgaard | Anders Johannsen
Proceedings of COLING 2012: Posters

Creation and use of Language Resources in a Question-Answering eHealth System
Ulrich Andersen | Anna Braasch | Lina Henriksen | Csaba Huszka | Anders Johannsen | Lars Kayser | Bente Maegaard | Ole Norgaard | Stefan Schulz | Jürgen Wedekind
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

ESICT (Experience-oriented Sharing of health knowledge via Information and Communication Technology) is an ongoing research project funded by the Danish Council for Strategic Research. It aims at developing a health/disease related information system based on information technology, language technology, and formalized medical knowledge. The formalized medical knowledge consists partly of the terminology database SNOMED CT and partly of authorized medical texts on the domain. The system will allow users to ask questions in Danish and will provide natural language answers. Currently, the project is pursuing three basically different methods for question answering, and they are all described to some extent in this paper. A system prototype will handle questions related to diabetes and heart diseases. This paper concentrates on the methods employed for question answering and the language resources that are utilized. Some resources were existing, such as SNOMED CT, others, such as a corpus of sample questions, have had to be created or constructed.

EMNLP@CPH: Is frequency all there is to simplicity?
Anders Johannsen | Héctor Martínez | Sigrid Klerke | Anders Søgaard
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

Shared Task System Description: Frustratingly Hard Compositionality Prediction
Anders Johannsen | Hector Martinez | Christian Rishøj | Anders Søgaard
Proceedings of the Workshop on Distributional Semantics and Compositionality

“Andre ord” – a wordnet browser for the Danish wordnet, DanNet
Anders Johannsen | Bolette Sandford Pedersen
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

Co-authors

Venues