Youmna Farag
2026
Context-Aware Language Understanding in Human-Robot Dialogue with LLMs
Svetlana Stoyanchev | Youmna Farag | Simon Keizer | Mohan Li | Rama Sanand Doddipatla
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology
Svetlana Stoyanchev | Youmna Farag | Simon Keizer | Mohan Li | Rama Sanand Doddipatla
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology
In this work, we explore the use of large language models (LLMs) as interpreters of user utterances within a human-robot language interface. A user interacting with a robot that operates in a physical environment should be able to issue commands that interrupt the robot’s actions, for example, corrections or refinement of the task. This study addresses the context-aware interpretation of user utterances, including those issued while the robot is actively engaged in task execution, exploring whether LLMs, without fine-tuning, can translate user commands into corresponding sequences of robot actions. Using an interactive multimodal interface—combining text and video—for a virtual robot operating in simulated home environments, we collect a dataset of user utterances that guide the robot through various household tasks simultaneously capturing manual interpretation when the automatic one fails. Driven by practical considerations, the collected dataset is used to compare the interpretive performance of GPT models with smaller publicly available alternatives. Our findings reveal that action-interrupting utterances pose challenges for all models. While GPT consistently outperforms the smaller models, interpretation accuracy improves across the board when relevant dynamically selected in-context learning examples are included in the prompt.
2025
Conditional Multi-Stage Failure Recovery for Embodied Agents
Youmna Farag | Svetlana Stoyanchev | Mohan Li | Simon Keizer | Rama Doddipatla
Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025)
Youmna Farag | Svetlana Stoyanchev | Mohan Li | Simon Keizer | Rama Doddipatla
Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025)
Embodied agents performing complex tasks are susceptible to execution failures, motivating the need for effective failure recovery mechanisms. In this work, we introduce a conditional multi-stage failure recovery framework that employs zero-shot chain prompting. The framework is structured into four error-handling stages, with three operating during task execution and one functioning as a post-execution reflection phase.Our approach utilises the reasoning capabilities of LLMs to analyse execution challenges within their environmental context and devise strategic solutions.We evaluate our method on the TfD benchmark of the TEACH dataset and achieve state-of-the-art performance, outperforming a baseline without error recovery by 11.5% and surpassing the strongest existing model by 19%.
2024
An LLM Feature-based Framework for Dialogue Constructiveness Assessment
Lexin Zhou | Youmna Farag | Andreas Vlachos
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Lexin Zhou | Youmna Farag | Andreas Vlachos
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Research on dialogue constructiveness assessment focuses on (i) analysing conversational factors that influence individuals to take specific actions, win debates, change their perspectives or broaden their open-mindedness and (ii) predicting constructiveness outcomes following dialogues for such use cases. These objectives can be achieved by training either interpretable feature-based models (which often involve costly human annotations) or neural models such as pre-trained language models (which have empirically shown higher task accuracy but lack interpretability). In this paper we propose an LLM feature-based framework for dialogue constructiveness assessment that combines the strengths of feature-based and neural approaches, while mitigating their downsides. The framework first defines a set of dataset-independent and interpretable linguistic features, which can be extracted by both prompting an LLM and simple heuristics. Such features are then used to train LLM feature-based models. We apply this framework to three datasets of dialogue constructiveness and find that our LLM feature-based models outperform or performs at least as well as standard feature-based models and neural models. We also find that the LLM feature-based model learns more robust prediction rules instead of relying on superficial shortcuts, which often trouble neural models.
2022
Opening up Minds with Argumentative Dialogues
Youmna Farag | Charlotte Brand | Jacopo Amidei | Paul Piwek | Tom Stafford | Svetlana Stoyanchev | Andreas Vlachos
Findings of the Association for Computational Linguistics: EMNLP 2022
Youmna Farag | Charlotte Brand | Jacopo Amidei | Paul Piwek | Tom Stafford | Svetlana Stoyanchev | Andreas Vlachos
Findings of the Association for Computational Linguistics: EMNLP 2022
Recent research on argumentative dialogues has focused on persuading people to take some action, changing their stance on the topic of discussion, or winning debates. In this work, we focus on argumentative dialogues that aim to open up (rather than change) people’s minds to help them become more understanding to views that are unfamiliar or in opposition to their own convictions. To this end, we present a dataset of 183 argumentative dialogues about 3 controversial topics: veganism, Brexit and COVID-19 vaccination. The dialogues were collected using the Wizard of Oz approach, where wizards leverage a knowledge-base of arguments to converse with participants. Open-mindedness is measured before and after engaging in the dialogue using a questionnaire from the psychology literature, and success of the dialogue is measured as the change in the participant’s stance towards those who hold opinions different to theirs. We evaluate two dialogue models: a Wikipedia-based and an argument-based model. We show that while both models perform closely in terms of opening up minds, the argument-based model is significantly better on other dialogue properties such as engagement and clarity.
2020
Analyzing Neural Discourse Coherence Models
Youmna Farag | Josef Valvoda | Helen Yannakoudakis | Ted Briscoe
Proceedings of the First Workshop on Computational Approaches to Discourse
Youmna Farag | Josef Valvoda | Helen Yannakoudakis | Ted Briscoe
Proceedings of the First Workshop on Computational Approaches to Discourse
In this work, we systematically investigate how well current models of coherence can capture aspects of text implicated in discourse organisation. We devise two datasets of various linguistic alterations that undermine coherence and test model sensitivity to changes in syntax and semantics. We furthermore probe discourse embedding space and examine the knowledge that is encoded in representations of coherence. We hope this study shall provide further insight into how to frame the task and improve models of coherence assessment further. Finally, we make our datasets publicly available as a resource for researchers to use to test discourse coherence models.
2019
Multi-Task Learning for Coherence Modeling
Youmna Farag | Helen Yannakoudakis
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Youmna Farag | Helen Yannakoudakis
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
We address the task of assessing discourse coherence, an aspect of text quality that is essential for many NLP tasks, such as summarization and language assessment. We propose a hierarchical neural network trained in a multi-task fashion that learns to predict a document-level coherence score (at the network’s top layers) along with word-level grammatical roles (at the bottom layers), taking advantage of inductive transfer between the two tasks. We assess the extent to which our framework generalizes to different domains and prediction tasks, and demonstrate its effectiveness not only on standard binary evaluation coherence tasks, but also on real-world tasks involving the prediction of varying degrees of coherence, achieving a new state of the art.
2018
Neural Automated Essay Scoring and Coherence Modeling for Adversarially Crafted Input
Youmna Farag | Helen Yannakoudakis | Ted Briscoe
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Youmna Farag | Helen Yannakoudakis | Ted Briscoe
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
We demonstrate that current state-of-the-art approaches to Automated Essay Scoring (AES) are not well-suited to capturing adversarially crafted input of grammatical but incoherent sequences of sentences. We develop a neural model of local coherence that can effectively learn connectedness features between sentences, and propose a framework for integrating and jointly training the local coherence model with a state-of-the-art AES model. We evaluate our approach against a number of baselines and experimentally demonstrate its effectiveness on both the AES task and the task of flagging adversarial input, further contributing to the development of an approach that strengthens the validity of neural essay scoring models.
2017
An Error-Oriented Approach to Word Embedding Pre-Training
Youmna Farag | Marek Rei | Ted Briscoe
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications
Youmna Farag | Marek Rei | Ted Briscoe
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications
We propose a novel word embedding pre-training approach that exploits writing errors in learners’ scripts. We compare our method to previous models that tune the embeddings based on script scores and the discrimination between correct and corrupt word contexts in addition to the generic commonly-used embeddings pre-trained on large corpora. The comparison is achieved by using the aforementioned models to bootstrap a neural network that learns to predict a holistic score for scripts. Furthermore, we investigate augmenting our model with error corrections and monitor the impact on performance. Our results show that our error-oriented approach outperforms other comparable ones which is further demonstrated when training on more data. Additionally, extending the model with corrections provides further performance gains when data sparsity is an issue.