Matthias Scheutz


pdf bib
Automating Dataset Production Using Generative Text and Image Models
Christopher Thierauf | Mitchell Abrams | Matthias Scheutz
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Practical and ethical dataset collection remains a challenge blocking many empirical methods in natural language processing, resulting in a lack of benchmarks or data on which to test hypotheses. We propose a solution to some of these areas by presenting a pipeline to reduce the research burden of producing image and text datasets when datasets may not exist. Our approach, with accompanying software tools, involves (1) generating text with LLMs; (2) creating accompanying image vignettes with text–to–image transformers; and (3) low-cost human validation. Based on existing literature that has struggled with quantitative evaluation (due to difficulty of data collection), we present the creation of 3 relevant datasets, and conduct a user study that demonstrates this approach is able to aid researchers in obtaining previously-challenging datasets. We provide sample data generated with this technique, the source code used to produce it, and discuss applicability and limitations.


pdf bib
A System For Robot Concept Learning Through Situated Dialogue
Benjamin Kane | Felix Gervits | Matthias Scheutz | Matthew Marge
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Robots operating in unexplored environments with human teammates will need to learn unknown concepts on the fly. To this end, we demonstrate a novel system that combines a computational model of question generation with a cognitive robotic architecture. The model supports dynamic production of back-and-forth dialogue for concept learning given observations of an environment, while the architecture supports symbolic reasoning, action representation, one-shot learning and other capabilities for situated interaction. The system is able to learn about new concepts including objects, locations, and actions, using an underlying approach that is generalizable and scalable. We evaluate the system by comparing learning efficiency to a human baseline in a collaborative reference resolution task and show that the system is effective and efficient in learning new concepts, and that it can informatively generate explanations about its behavior.

pdf bib
Social Norms Guide Reference Resolution
Mitchell Abrams | Matthias Scheutz
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Humans use natural language, vision, and context to resolve referents in their environment. While some situated reference resolution is trivial, ambiguous cases arise when the language is underspecified or there are multiple candidate referents. This study investigates howpragmatic modulators external to the linguistic content are critical for the correct interpretation of referents in these scenarios. Inparticular, we demonstrate in a human subjects experiment how the social norms applicable in the given context influence theinterpretation of referring expressions. Additionally, we highlight how current coreference tools in natural language processing fail tohandle these ambiguous cases. We also briefly discuss the implications of this work for assistive robots which will routinely need to resolve referents in their environment.


pdf bib
How Should Agents Ask Questions For Situated Learning? An Annotated Dialogue Corpus
Felix Gervits | Antonio Roque | Gordon Briggs | Matthias Scheutz | Matthew Marge
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Intelligent agents that are confronted with novel concepts in situated environments will need to ask their human teammates questions to learn about the physical world. To better understand this problem, we need data about asking questions in situated task-based interactions. To this end, we present the Human-Robot Dialogue Learning (HuRDL) Corpus - a novel dialogue corpus collected in an online interactive virtual environment in which human participants play the role of a robot performing a collaborative tool-organization task. We describe the corpus data and a corresponding annotation scheme to offer insight into the form and content of questions that humans ask to facilitate learning in a situated environment. We provide the corpus as an empirically-grounded resource for improving question generation in situated intelligent agents.


pdf bib
Generating Explanations of Action Failures in a Cognitive Robotic Architecture
Ravenna Thielstrom | Antonio Roque | Meia Chita-Tegmark | Matthias Scheutz
2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence

We describe an approach to generating explanations about why robot actions fail, focusing on the considerations of robots that are run by cognitive robotic architectures. We define a set of Failure Types and Explanation Templates, motivating them by the needs and constraints of cognitive architectures that use action scripts and interpretable belief states, and describe content realization and surface realization in this context. We then describe an evaluation that can be extended to further study the effects of varying the explanation templates.

pdf bib
Developing a Corpus of Indirect Speech Act Schemas
Antonio Roque | Alexander Tsuetaki | Vasanth Sarathy | Matthias Scheutz
Proceedings of the Twelfth Language Resources and Evaluation Conference

Resolving Indirect Speech Acts (ISAs), in which the intended meaning of an utterance is not identical to its literal meaning, is essential to enabling the participation of intelligent systems in peoples’ everyday lives. Especially challenging are those cases in which the interpretation of such ISAs depends on context. To test a system’s ability to perform ISA resolution we need a corpus, but developing such a corpus is difficult, especialy given the contex-dependent requirement. This paper addresses the difficult problems of constructing a corpus of ISAs, taking inspiration from relevant work in using corpora for reasoning tasks. We present a formal representation of ISA Schemas required for such testing, including a measure of the difficulty of a particular schema. We develop an approach to authoring these schemas using corpus analysis and crowdsourcing, to maximize realism and minimize the amount of expert authoring needed. Finally, we describe several characteristics of collected data, and potential future work.

pdf bib
Reasoning Requirements for Indirect Speech Act Interpretation
Vasanth Sarathy | Alexander Tsuetaki | Antonio Roque | Matthias Scheutz
Proceedings of the 28th International Conference on Computational Linguistics

We perform a corpus analysis to develop a representation of the knowledge and reasoning used to interpret indirect speech acts. An indirect speech act (ISA) is an utterance whose intended meaning is different from its literal meaning. We focus on those speech acts in which slight changes in situational or contextual information can switch the dominant intended meaning of an utterance from direct to indirect or vice-versa. We computationalize how various contextual features can influence a speaker’s beliefs, and how these beliefs can influence the intended meaning and choice of the surface form of an utterance. We axiomatize the domain-general patterns of reasoning involved, and implement a proof-of-concept architecture using Answer Set Programming. Our model is presented as a contribution to cognitive science and psycholinguistics, so representational decisions are justified by existing theoretical work.

pdf bib
It’s About Time: Turn-Entry Timing For Situated Human-Robot Dialogue
Felix Gervits | Ravenna Thielstrom | Antonio Roque | Matthias Scheutz
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Turn-entry timing is an important requirement for conversation, and one that spoken dialogue systems largely fail at. In this paper, we introduce a computational framework based on work from Psycholinguistics, which is aimed at achieving proper turn-taking timing for situated agents. The approach involves incremental processing and lexical prediction of the turn in progress, which allows a situated dialogue system to start its turn and initiate actions earlier than would otherwise be possible. We evaluate the framework by integrating it within a cognitive robotic architecture and testing performance on a corpus of task-oriented human-robot directives. We demonstrate that: 1) the system is superior to a non-incremental system in terms of faster responses, reduced gap between turns, and the ability to perform actions early, 2) the system can time its turn to come in immediately at a transition point or earlier to produce several types of overlap, and 3) the system is robust to various forms of disfluency in the input. Overall, this domain-independent framework can be integrated into various dialogue systems to improve responsiveness, and is a step toward more natural, human-like turn-taking behavior.


pdf bib
Engaging in Dialogue about an Agent’s Norms and Behaviors
Daniel Kasenberg | Antonio Roque | Ravenna Thielstrom | Matthias Scheutz
Proceedings of the 1st Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence (NL4XAI 2019)

pdf bib
Generating justifications for norm-related agent decisions
Daniel Kasenberg | Antonio Roque | Ravenna Thielstrom | Meia Chita-Tegmark | Matthias Scheutz
Proceedings of the 12th International Conference on Natural Language Generation

We present an approach to generating natural language justifications of decisions derived from norm-based reasoning. Assuming an agent which maximally satisfies a set of rules specified in an object-oriented temporal logic, the user can ask factual questions (about the agent’s rules, actions, and the extent to which the agent violated the rules) as well as “why” questions that require the agent comparing actual behavior to counterfactual trajectories with respect to these rules. To produce natural-sounding explanations, we focus on the subproblem of producing natural language clauses from statements in a fragment of temporal logic, and then describe how to embed these clauses into explanatory sentences. We use a human judgment evaluation on a testbed task to compare our approach to variants in terms of intelligibility, mental model and perceived trust.


pdf bib
Towards a Conversation-Analytic Taxonomy of Speech Overlap
Felix Gervits | Matthias Scheutz
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Pardon the Interruption: Managing Turn-Taking through Overlap Resolution in Embodied Artificial Agents
Felix Gervits | Matthias Scheutz
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

Speech overlap is a common phenomenon in natural conversation and in task-oriented interactions. As human-robot interaction (HRI) becomes more sophisticated, the need to effectively manage turn-taking and resolve overlap becomes more important. In this paper, we introduce a computational model for speech overlap resolution in embodied artificial agents. The model identifies when overlap has occurred and uses timing information, dialogue history, and the agent’s goals to generate context-appropriate behavior. We implement this model in a Nao robot using the DIARC cognitive robotic architecture. The model is evaluated on a corpus of task-oriented human dialogue, and we find that the robot can replicate many of the most common overlap resolution behaviors found in the human data.

pdf bib
Sensitivity to Input Order: Evaluation of an Incremental and Memory-Limited Bayesian Cross-Situational Word Learning Model
Sepideh Sadeghi | Matthias Scheutz
Proceedings of the 27th International Conference on Computational Linguistics

We present a variation of the incremental and memory-limited algorithm in (Sadeghi et al., 2017) for Bayesian cross-situational word learning and evaluate the model in terms of its functional performance and its sensitivity to input order. We show that the functional performance of our sub-optimal model on corpus data is close to that of its optimal counterpart (Frank et al., 2009), while only the sub-optimal model is capable of predicting the input order effects reported in experimental studies.


pdf bib
Referring Expression Generation under Uncertainty: Algorithm and Evaluation Framework
Tom Williams | Matthias Scheutz
Proceedings of the 10th International Conference on Natural Language Generation

For situated agents to effectively engage in natural-language interactions with humans, they must be able to refer to entities such as people, locations, and objects. While classic referring expression generation (REG) algorithms like the Incremental Algorithm (IA) assume perfect, complete, and accessible knowledge of all referents, this is not always possible. In this work, we show how a previously presented consultant framework (which facilitates reference resolution when knowledge is uncertain, heterogeneous and distributed) can be used to extend the IA to produce DIST-PIA, a domain-independent algorithm for REG under uncertain, heterogeneous, and distributed knowledge. We also present a novel framework that can be used to evaluate such REG algorithms without conflating the performance of the algorithm with the performance of classifiers it employs.

pdf bib
Creating POS Tagging and Dependency Parsing Experts via Topic Modeling
Atreyee Mukherjee | Sandra Kübler | Matthias Scheutz
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Part of speech (POS) taggers and dependency parsers tend to work well on homogeneous datasets but their performance suffers on datasets containing data from different genres. In our current work, we investigate how to create POS tagging and dependency parsing experts for heterogeneous data by employing topic modeling. We create topic models (using Latent Dirichlet Allocation) to determine genres from a heterogeneous dataset and then train an expert for each of the genres. Our results show that the topic modeling experts reach substantial improvements when compared to the general versions. For dependency parsing, the improvement reaches 2 percent points over the full training baseline when we use two topics.


pdf bib
Disfluent but effective? A quantitative study of disfluencies and conversational moves in team discourse
Felix Gervits | Kathleen Eberhard | Matthias Scheutz
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Situated dialogue systems that interact with humans as part of a team (e.g., robot teammates) need to be able to use information from communication channels to gauge the coordination level and effectiveness of the team. Currently, the feasibility of this end goal is limited by several gaps in both the empirical and computational literature. The purpose of this paper is to address those gaps in the following ways: (1) investigate which properties of task-oriented discourse correspond with effective performance in human teams, and (2) discuss how and to what extent these properties can be utilized in spoken dialogue systems. To this end, we analyzed natural language data from a unique corpus of spontaneous, task-oriented dialogue (CReST corpus), which was annotated for disfluencies and conversational moves. We found that effective teams made more self-repair disfluencies and used specific communication strategies to facilitate grounding and coordination. Our results indicate that truly robust and natural dialogue systems will need to interpret highly disfluent utterances and also utilize specific collaborative mechanisms to facilitate grounding. These data shed light on effective communication in performance scenarios and directly inform the development of robust dialogue systems for situated artificial agents.

pdf bib
POS Tagging Experts via Topic Modeling
Atreyee Mukherjee | Sandra Kübler | Matthias Scheutz
Proceedings of the 13th International Conference on Natural Language Processing


pdf bib
Modeling Blame to Avoid Positive Face Threats in Natural Language Generation
Gordon Briggs | Matthias Scheutz
Proceedings of the INLG and SIGDIAL 2014 Joint Session


pdf bib
Facilitating Mental Modeling in Collaborative Human-Robot Interaction through Adverbial Cues
Gordon Briggs | Matthias Scheutz
Proceedings of the SIGDIAL 2011 Conference

pdf bib
Actions Speak Louder than Words: Evaluating Parsers in the Context of Natural Language Understanding Systems for Human-Robot Interaction
Sandra Kübler | Rachael Cantrell | Matthias Scheutz
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011


pdf bib
The Indiana “Cooperative Remote Search Task” (CReST) Corpus
Kathleen Eberhard | Hannele Nicholson | Sandra Kübler | Susan Gundersen | Matthias Scheutz
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper introduces a novel corpus of natural language dialogues obtained from humans performing a cooperative, remote, search task (CReST) as it occurs naturally in a variety of scenarios (e.g., search and rescue missions in disaster areas). This corpus is unique in that it involves remote collaborations between two interlocutors who each have to perform tasks that require the other's assistance. In addition, one interlocutor's tasks require physical movement through an indoor environment as well as interactions with physical objects within the environment. The multi-modal corpus contains the speech signals as well as transcriptions of the dialogues, which are additionally annotated for dialog structure, disfluencies, and for constituent and dependency syntax. On the dialogue level, the corpus was annotated for separate dialogue moves, based on the classification developed by Carletta et al. (1997) for coding task-oriented dialogues. Disfluencies were annotated using the scheme developed by Lickley (1998). The syntactic annotation comprises POS annotation, Penn Treebank style constituent annotations as well as dependency annotations based on the dependencies of pennconverter.