Luciana Benotti


2022

pdf bib
What kinds of errors do reference resolution models make and what can we learn from them?
Jorge Sánchez | Mauricio Mazuecos | Hernán Maina | Luciana Benotti
Findings of the Association for Computational Linguistics: NAACL 2022

Referring resolution is the task of identifying the referent of a natural language expression, for example “the woman behind the other woman getting a massage”. In this paper we investigate which are the kinds of referring expressions on which current transformer based models fail. Motivated by this analysis we identify the weakening of the spatial natural constraints as one of its causes and propose a model that aims to restore it. We evaluate our proposed model on different datasets for the task showing improved performance on the most challenging kinds of referring expressions. Finally we present a thorough analysis of the kinds errors that are improved by the new model and those that are not and remain future challenges for the task.

pdf bib
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts
Luciana Benotti | Naoaki Okazaki | Yves Scherrer | Marcos Zampieri
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

2021

pdf bib
Region under Discussion for visual dialog
Mauricio Mazuecos | Franco M. Luque | Jorge Sánchez | Hernán Maina | Thomas Vadora | Luciana Benotti
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Visual Dialog is assumed to require the dialog history to generate correct responses during a dialog. However, it is not clear from previous work how dialog history is needed for visual dialog. In this paper we define what it means for a visual question to require dialog history and we release a subset of the Guesswhat?! questions for which their dialog history completely changes their responses. We propose a novel interpretable representation that visually grounds dialog history: the Region under Discussion. It constrains the image’s spatial features according to a semantic representation of the history inspired by the information structure notion of Question under Discussion.We evaluate the architecture on task-specific multimodal models and the visual transformer model LXMERT.

pdf bib
The Impact of Answers in Referential Visual Dialog
Mauricio Mazuecos | Patrick Blackburn | Luciana Benotti
Proceedings of the Reasoning and Interaction Conference (ReInAct 2021)

In the visual dialog task GuessWhat?! two players maintain a dialog in order to identify a secret object in an image. Computationally, this is modeled using a question generation module and a guesser module for the questioner role and an answering model, the Oracle, to answer the generated questions. This raises a question: what’s the risk of having an imperfect oracle model?. Here we present work in progress in the study of the impact of different answering models in human generated questions in GuessWhat?!. We show that having access to better quality answers has a direct impact on the guessing task for human dialog and argue that better answers could help train better question generation models.

pdf bib
Visually Grounded Follow-up Questions: a Dataset of Spatial Questions Which Require Dialogue History
Tianai Dong | Alberto Testoni | Luciana Benotti | Raffaella Bernardi
Proceedings of Second International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics

In this paper, we define and evaluate a methodology for extracting history-dependent spatial questions from visual dialogues. We say that a question is history-dependent if it requires (parts of) its dialogue history to be interpreted. We argue that some kinds of visual questions define a context upon which a follow-up spatial question relies. We call the question that restricts the context: trigger, and we call the spatial question that requires the trigger question to be answered: zoomer. We automatically extract different trigger and zoomer pairs based on the visual property that the questions rely on (e.g. color, number). We manually annotate the automatically extracted trigger and zoomer pairs to verify which zoomers require their trigger. We implement a simple baseline architecture based on a SOTA multimodal encoder. Our results reveal that there is much room for improvement for answering history-dependent questions.

pdf bib
Grounding as a Collaborative Process
Luciana Benotti | Patrick Blackburn
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Collaborative grounding is a fundamental aspect of human-human dialog which allows people to negotiate meaning. In this paper we argue that it is missing from current deep learning approaches to dialog. Our central point is that making mistakes and being able to recover from them collaboratively is a key ingredient in grounding meaning. We illustrate the pitfalls of being unable to ground collaboratively, discuss what can be learned from the language acquisition and dialog systems literature, and reflect on how to move forward.

pdf bib
A recipe for annotating grounded clarifications
Luciana Benotti | Patrick Blackburn
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In order to interpret the communicative intents of an utterance, it needs to be grounded in something that is outside of language; that is, grounded in world modalities. In this paper, we argue that dialogue clarification mechanisms make explicit the process of interpreting the communicative intents of the speaker’s utterances by grounding them in the various modalities in which the dialogue is situated. This paper frames dialogue clarification mechanisms as an understudied research problem and a key missing piece in the giant jigsaw puzzle of natural language understanding. We discuss both the theoretical background and practical challenges posed by this problem and propose a recipe for obtaining grounding annotations. We conclude by highlighting ethical issues that need to be addressed in future work.

2020

pdf bib
They Are Not All Alike: Answering Different Spatial Questions Requires Different Grounding Strategies
Alberto Testoni | Claudio Greco | Tobias Bianchi | Mauricio Mazuecos | Agata Marcante | Luciana Benotti | Raffaella Bernardi
Proceedings of the Third International Workshop on Spatial Language Understanding

In this paper, we study the grounding skills required to answer spatial questions asked by humans while playing the GuessWhat?! game. We propose a classification for spatial questions dividing them into absolute, relational, and group questions. We build a new answerer model based on the LXMERT multimodal transformer and we compare a baseline with and without visual features of the scene. We are interested in studying how the attention mechanisms of LXMERT are used to answer spatial questions since they require putting attention on more than one region simultaneously and spotting the relation holding among them. We show that our proposed model outperforms the baseline by a large extent (9.70% on spatial questions and 6.27% overall). By analyzing LXMERT errors and its attention mechanisms, we find that our classification helps to gain a better understanding of the skills required to answer different spatial questions.

bib
Effective questions in referential visual dialogue
Mauricio Mazuecos | Alberto Testoni | Raffaella Bernardi | Luciana Benotti
Proceedings of the The Fourth Widening Natural Language Processing Workshop

An interesting challenge for situated dialogue systems is referential visual dialog: by asking questions, the system has to identify the referent to which the user refers to. Task success is the standard metric used to evaluate these systems. However, it does not consider how effective each question is, that is how much each question contributes to the goal. We propose a new metric, that measures question effectiveness. As a preliminary study, we report the new metric for state of the art publicly available models on GuessWhat?!. Surprisingly, successful dialogues do not have a higher percentage of effective questions than failed dialogues. This suggests that a system with high task success is not necessarily one that generates good questions.

pdf bib
On the role of effective and referring questions in GuessWhat?!
Mauricio Mazuecos | Alberto Testoni | Raffaella Bernardi | Luciana Benotti
Proceedings of the First Workshop on Advances in Language and Vision Research

Task success is the standard metric used to evaluate referential visual dialogue systems. In this paper we propose two new metrics that evaluate how each question contributes to the goal. First, we measure how effective each question is by evaluating whether the question discards objects that are not the referent. Second, we define referring questions as those that univocally identify one object in the image. We report the new metrics for human dialogues and for state of the art publicly available models on GuessWhat?!. Regarding our first metric, we find that successful dialogues do not have a higher percentage of effective questions for most models. With respect to the second metric, humans make questions at the end of the dialogue that are referring, confirming their guess before guessing. Human dialogues that use this strategy have a higher task success but models do not seem to learn it.

2018

pdf bib
Modeling Student Response Times: Towards Efficient One-on-one Tutoring Dialogues
Luciana Benotti | Jayadev Bhaskaran | Sigtryggur Kjartansson | David Lang
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

In this paper we investigate the task of modeling how long it would take a student to respond to a tutor question during a tutoring dialogue. Solving such a task has applications in educational settings such as intelligent tutoring systems, as well as in platforms that help busy human tutors to keep students engaged. Knowing how long it would normally take a student to respond to different types of questions could help tutors optimize their own time while answering multiple dialogues concurrently, as well as deciding when to prompt a student again. We study this problem using data from a service that offers tutor support for math, chemistry and physics through an instant messaging platform. We create a dataset of 240K questions. We explore several strong baselines for this task and compare them with human performance.

2015

pdf bib
Zoom: a corpus of natural language descriptions of map locations
Romina Altamirano | Thiago Ferreira | Ivandré Paraboni | Luciana Benotti
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
A Natural Language Instructor for pedestrian navigation based in generation by selection
Santiago Avalos | Luciana Benotti
Proceedings of the EACL 2014 Workshop on Dialogue in Motion

pdf bib
Mining human interactions to construct a virtual guide for a virtual fair
Andrés Luna | Luciana Benotti
Proceedings of the EACL 2014 Workshop on Dialogue in Motion

2012

pdf bib
Corpus-based Interpretation of Instructions in Virtual Environments
Luciana Benotti | Martín Villalba | Tessa Lau | Julián Cerruti
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Probabilistic Refinement Algorithms for the Generation of Referring Expressions
Romina Altamirano | Carlos Areces | Luciana Benotti
Proceedings of COLING 2012: Posters

2011

pdf bib
Prototyping virtual instructors from human-human corpora
Luciana Benotti | Alexandre Denis
Proceedings of the ACL-HLT 2011 System Demonstrations

pdf bib
Giving instructions in virtual environments by corpus based selection
Luciana Benotti | Alexandre Denis
Proceedings of the SIGDIAL 2011 Conference

pdf bib
The GIVE-2.5 C Generation System
David Nicolás Racca | Luciana Benotti | Pablo Duboue
Proceedings of the 13th European Workshop on Natural Language Generation

pdf bib
CL system: Giving instructions by corpus based selection
Luciana Benotti | Alexandre Denis
Proceedings of the 13th European Workshop on Natural Language Generation

2010

pdf bib
Dialogue Systems for Virtual Environments
Luciana Benotti | Paula Estrella | Carlos Areces
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas

pdf bib
Negotiating causal implicatures
Luciana Benotti | Patrick Blackburn
Proceedings of the SIGDIAL 2010 Conference

2009

pdf bib
Frolog: an Accommodating Text-Adventure Game
Luciana Benotti
Proceedings of the Demonstrations Session at EACL 2009

pdf bib
A computational account of comparative implicatures for a spoken dialogue agent
Luciana Benotti | David Traum
Proceedings of the Eight International Conference on Computational Semantics

pdf bib
Clarification Potential of Instructions
Luciana Benotti
Proceedings of the SIGDIAL 2009 Conference