Alex Lascarides

2026

Contrastive Learning with Narrative Twins for Modeling Story Salience
Igor Sterner | Alex Lascarides | Frank Keller
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Understanding narratives requires identifying which events are most salient for a story’s progression. We present a contrastive learning framework for modeling narrative salience that learns story embeddings from narrative twins: stories that share the same plot but differ in surface form. Our model is trained to distinguish a story from both its narrative twin and a distractor with similar surface features but different plot. Using the resulting embeddings, we evaluate four narratologically motivated operations for inferring salience (deletion, shifting, disruption, and summarization). Experiments on short narratives from the ROCStories corpus and longer Wikipedia plot summaries show that contrastively learned story embeddings outperform a masked-language-model baseline, and that summarization is the most reliable operation for identifying salient sentences. If narrative twins are not available, random dropout can be used to generate the twins from a single story. Effective distractors can be obtained either by prompting LLMs or, in long-form narratives, by using different parts of the same story.

2023

pdf bib abs

Learning the Effects of Physical Actions in a Multi-modal Environment
Gautier Dagan | Frank Keller | Alex Lascarides
Findings of the Association for Computational Linguistics: EACL 2023

Large Language Models (LLMs) handle physical commonsense information inadequately. As a result of being trained in a disembodied setting, LLMs often fail to predict an action’s outcome in a given environment. However, predicting the effects of an action before it is executed is crucial in planning, where coherent sequences of actions are often needed to achieve a goal. Therefore, we introduce the multi-modal task of predicting the outcomes of actions solely from realistic sensory inputs (images and text). Next, we extend an LLM to model latent representations of objects to better predict action outcomes in an environment. We show that multi-modal models can capture physical commonsense when augmented with visual information. Finally, we evaluate our model’s performance on novel actions and objects and find that combining modalities help models to generalize and learn physical commonsense reasoning better.

pdf bib abs

Interactive Acquisition of Fine-grained Visual Concepts by Exploiting Semantics of Generic Characterizations in Discourse
Jonghyuk Park | Alex Lascarides | Subramanian Ramamoorthy
Proceedings of the 15th International Conference on Computational Semantics

Interactive Task Learning (ITL) concerns learning about unforeseen domain concepts via natural interactions with human users. The learner faces a number of significant constraints: learning should be online, incremental and few-shot, as it is expected to perform tangible belief updates right after novel words denoting unforeseen concepts are introduced. In this work, we explore a challenging symbol grounding task—discriminating among object classes that look very similar—within the constraints imposed by ITL. We demonstrate empirically that more data-efficient grounding results from exploiting the truth-conditions of the teacher’s generic statements (e.g., “Xs have attribute Z.”) and their implicatures in context (e.g., as an answer to “How are Xs and Ys different?”, one infers Y lacks attribute Z).

pdf bib abs

Dialogue-based generation of self-driving simulation scenarios using Large Language Models
Antonio Valerio Miceli Barone | Craig Innes | Alex Lascarides
Proceedings of the 3rd Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics (SpLU-RoboNLP 2023)

Simulation is an invaluable tool for developing and evaluating controllers for self-driving cars. Current simulation frameworks are driven by highly-specialist domain specific languages, and so a natural language interface would greatly enhance usability. But there is often a gap, consisting of tacit assumptions the user is making, between a concise English utterance and the executable code that captures the user’s intent. In this paper we describe a system that addresses this issue by supporting an extended multimodal interaction: the user can follow up prior instructions with refinements or revisions, in reaction to the simulations that have been generated from their utterances so far. We use Large Language Models (LLMs) to map the user’s English utterances in this interaction into domain-specific code, and so we explore the extent to which LLMs capture the context sensitivity that’s necessary for computing the speaker’s intended message in discourse.

2022

pdf bib abs

Interactive Symbol Grounding with Complex Referential Expressions
Rimvydas Rubavicius | Alex Lascarides
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We present a procedure for learning to ground symbols from a sequence of stimuli consisting of an arbitrarily complex noun phrase (e.g. “all but one green square above both red circles.”) and its designation in the visual scene. Our distinctive approach combines: a) lazy few-shot learning to relate open-class words like green and above to their visual percepts; and b) symbolic reasoning with closed-class word categories like quantifiers and negation. We use this combination to estimate new training examples for grounding symbols that occur within a noun phrase but aren’t designated by that noun phase (e.g, red in the above example), thereby potentially gaining data efficiency. We evaluate the approach in a visual reference resolution task, in which the learner starts out unaware of concepts that are part of the domain model and how they relate to visual percepts.

2021

pdf bib abs

Symbol Grounding and Task Learning from Imperfect Corrections
Mattias Appelgren | Alex Lascarides
Proceedings of Second International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics

This paper describes a method for learning from a teacher’s potentially unreliable corrective feedback in an interactive task learning setting. The graphical model uses discourse coherence to jointly learn symbol grounding, domain concepts and valid plans. Our experiments show that the agent learns its domain-level task in spite of the teacher’s mistakes.

2017

pdf bib abs

Evaluating Persuasion Strategies and Deep Reinforcement Learning methods for Negotiation Dialogue agents
Simon Keizer | Markus Guhe | Heriberto Cuayáhuitl | Ioannis Efstathiou | Klaus-Peter Engelbrecht | Mihai Dobre | Alex Lascarides | Oliver Lemon
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

In this paper we present a comparative evaluation of various negotiation strategies within an online version of the game “Settlers of Catan”. The comparison is based on human subjects playing games against artificial game-playing agents (‘bots’) which implement different negotiation dialogue strategies, using a chat dialogue interface to negotiate trades. Our results suggest that a negotiation strategy that uses persuasion, as well as a strategy that is trained from data using Deep Reinforcement Learning, both lead to an improved win rate against humans, compared to previous rule-based and supervised learning baseline dialogue negotiators.

pdf bib abs

Grounding Symbols in Multi-Modal Instructions
Yordan Hristov | Svetlin Penkov | Alex Lascarides | Subramanian Ramamoorthy
Proceedings of the First Workshop on Language Grounding for Robotics

As robots begin to cohabit with humans in semi-structured environments, the need arises to understand instructions involving rich variability—for instance, learning to ground symbols in the physical world. Realistically, this task must cope with small datasets consisting of a particular users’ contextual assignment of meaning to terms. We present a method for processing a raw stream of cross-modal input—i.e., linguistic instructions, visual perception of a scene and a concurrent trace of 3D eye tracking fixations—to produce the segmentation of objects with a correspondent association to high-level concepts. To test our framework we present experiments in a table-top object manipulation scenario. Our results show our model learns the user’s notion of colour and shape from a small number of physical demonstrations, generalising to identifying physical referents for novel combinations of the words.

Alex Lascarides

2026

2023

2022

2021

2017

2015

2013

2012

2011

2009

2008

2005

2004

2003

2002

2001

1999

1997

1994

1993

1992

1991

Co-authors

Venues