Lenhart Schubert

Also published as: Len Schubert, Lenhart K. Schubert

2025

Boosting a Semantic Parser Using Treebank Trees Automatically Annotated with Unscoped Logical Forms
Miles Frank | Lenhart Schubert
Proceedings of the Sixth International Workshop on Designing Meaning Representations

Deriving structured semantic representations from unrestricted text, in a format suitable for sound, explainable reasoning, is an important goal for achieving AGI. Consequently much effort has been invested in this goal, but the proposed representations fall short in various ways. Unscoped Logical Form (ULF) is a strictly typed, loss-free semantic representation close to surface form and conducive to linguistic inference. ULF can be further resolved into the more precise Episodic Logic. Previous transformer language models have shown promise in the task of parsing English to ULF, but suffered from a lack of a substantial dataset for training. We present a new fine-tuned language model parser for ULF, trained on a greatly expanded dataset of ULFs automatically derived from Brown corpus Treebank parse trees. Additionally, the model uses Parameter Efficient Fine Tuning (PEFT) to leverage a substantially larger base model than its predecessor while maintaining fast training times. We find that training on automatically derived ULFs substantially improves parser performance from the existing smaller dataset (from SEMBLEU score of 0.43 to 0.68), or even the previously used larger, generatively augmented ULF dataset, used with a transition parser (from SEMBLEU score of 0.49 to 0.68).

2023

pdf bib abs

Semantically Informed Data Augmentation for Unscoped Episodic Logical Forms
Mandar Juvekar | Gene Kim | Lenhart Schubert
Proceedings of the 15th International Conference on Computational Semantics

Unscoped Logical Form (ULF) of Episodic Logic is a meaning representation format that captures the overall semantic type structure of natural language while leaving certain finer details, such as word sense and quantifier scope, underspecified for ease of parsing and annotation. While a learned parser exists to convert English to ULF, its performance is severely limited by the lack of a large dataset to train the system. We present a ULF dataset augmentation method that samples type-coherent ULF expressions using the ULF semantic type system and filters out samples corresponding to implausible English sentences using a pretrained language model. Our data augmentation method is configurable with parameters that trade off between plausibility of samples with sample novelty and augmentation size. We find that the best configuration of this augmentation method substantially improves parser performance beyond using the existing unaugmented dataset.

pdf bib abs

We Are What We Repeatedly Do: Inducing and Deploying Habitual Schemas in Persona-Based Responses
Benjamin Kane | Lenhart Schubert
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Many practical applications of dialogue technology require the generation of responses according to a particular developer-specified persona. While a variety of personas can be elicited from recent large language models, the opaqueness and unpredictability of these models make it desirable to be able to specify personas in an explicit form. In previous work, personas have typically been represented as sets of one-off pieces of self-knowledge that are retrieved by the dialogue system for use in generation. However, in realistic human conversations, personas are often revealed through story-like narratives that involve rich habitual knowledge – knowledge about kinds of events that an agent often participates in (e.g., work activities, hobbies, sporting activities, favorite entertainments, etc.), including typical goals, sub-events, preconditions, and postconditions of those events. We capture such habitual knowledge using an explicit schema representation, and propose an approach to dialogue generation that retrieves relevant schemas to condition a large language model to generate persona-based responses. Furthermore, we demonstrate a method for bootstrapping the creation of such schemas by first generating generic passages from a set of simple facts, and then inducing schemas from the generated passages.

2022

pdf bib abs

Logical Story Representations via FrameNet + Semantic Parsing
Lane Lawley | Lenhart Schubert
Proceedings of the Workshop on Dimensions of Meaning: Distributional and Curated Semantics (DistCurate 2022)

We propose a means of augmenting FrameNet parsers with a formal logic parser to obtain rich semantic representations of events. These schematic representations of the frame events, which we call Episodic Logic (EL) schemas, abstract constants to variables, preserving their types and relationships to other individuals in the same text. Due to the temporal semantics of the chosen logical formalism, all identified schemas in a text are also assigned temporally bound “episodes” and related to one another in time. The semantic role information from the FrameNet frames is also incorporated into the schema’s type constraints. We describe an implementation of this method using a neural FrameNet parser, and discuss the approach’s possible applications to question answering and open-domain event schema learning.

pdf bib abs

Mining Logical Event Schemas From Pre-Trained Language Models
Lane Lawley | Lenhart Schubert
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

We present NESL (the Neuro-Episodic Schema Learner), an event schema learning system that combines large language models, FrameNet parsing, a powerful logical representation of language, and a set of simple behavioral schemas meant to bootstrap the learning process. In lieu of a pre-made corpus of stories, our dataset is a continuous feed of “situation samples” from a pre-trained language model, which are then parsed into FrameNet frames, mapped into simple behavioral schemas, and combined and generalized into complex, hierarchical schemas for a variety of everyday scenarios. We show that careful sampling from the language model can help emphasize stereotypical properties of situations and de-emphasize irrelevant details, and that the resulting schemas specify situations more comprehensively than those learned by other systems.

2021

pdf bib abs

A (Mostly) Symbolic System for Monotonic Inference with Unscoped Episodic Logical Forms
Gene Kim | Mandar Juvekar | Junis Ekmekciu | Viet Duong | Lenhart Schubert
Proceedings of the 1st and 2nd Workshops on Natural Logic Meets Machine Learning (NALOMA)

We implement the formalization of natural logic-like monotonic inference using Unscoped Episodic Logical Forms (ULFs) by Kim et al. (2020). We demonstrate this system’s capacity to handle a variety of challenging semantic phenomena using the FraCaS dataset (Cooper et al., 1996). These results give empirical evidence for prior claims that ULF is an appropriate representation to mediate natural logic-like inferences.

pdf bib abs

A Transition-based Parser for Unscoped Episodic Logical Forms
Gene Kim | Viet Duong | Xin Lu | Lenhart Schubert
Proceedings of the 14th International Conference on Computational Semantics (IWCS)

“Episodic Logic: Unscoped Logical Form” (EL-ULF) is a semantic representation capturing predicate-argument structure as well as more challenging aspects of language within the Episodic Logic formalism. We present the first learned approach for parsing sentences into ULFs, using a growing set of annotated examples. The results provide a strong baseline for future improvement. Our method learns a sequence-to-sequence model for predicting the transition action sequence within a modified cache transition system. We evaluate the efficacy of type grammar-based constraints, a word-to-symbol lexicon, and transition system state features in this task. Our system is available at https://github.com/genelkim/ulf-transition-parser. We also present the first official annotated ULF dataset at https://www.cs.rochester.edu/u/gkim21/ulf/resources/.

pdf bib abs

Modeling Semantics and Pragmatics of Spatial Prepositions via Hierarchical Common-Sense Primitives
Georgiy Platonov | Yifei Yang | Haoyu Wu | Jonathan Waxman | Marcus Hill | Lenhart Schubert
Proceedings of Second International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics

Understanding spatial expressions and using them appropriately is necessary for seamless and natural human-machine interaction. However, capturing the semantics and appropriate usage of spatial prepositions is notoriously difficult, because of their vagueness and polysemy. Although modern data-driven approaches are good at capturing statistical regularities in the usage, they usually require substantial sample sizes, often do not generalize well to unseen instances and, most importantly, their structure is essentially opaque to analysis, which makes diagnosing problems and understanding their reasoning process difficult. In this work, we discuss our attempt at modeling spatial senses of prepositions in English using a combination of rule-based and statistical learning approaches. Each preposition model is implemented as a tree where each node computes certain intuitive relations associated with the preposition, with the root computing the final value of the prepositional relation itself. The models operate on a set of artificial 3D “room world” environments, designed in Blender, taking the scene itself as an input. We also discuss our annotation framework used to collect human judgments employed in the model training. Both our factored models and black-box baseline models perform quite well, but the factored models will enable reasoned explanations of spatial relation judgements.

pdf bib abs

Generating Justifications in a Spatial Question-Answering Dialogue System for a Blocks World
Georgiy Platonov | Benjamin Kane | Lenhart Schubert
Proceedings of the Reasoning and Interaction Conference (ReInAct 2021)

As AI reaches wider adoption, designing systems that are explainable and interpretable becomes a critical necessity. In particular, when it comes to dialogue systems, their reasoning must be transparent and must comply with human intuitions in order for them to be integrated seamlessly into day-to-day collaborative human-machine activities. Here, we describe our ongoing work on a (general purpose) dialogue system equipped with a spatial specialist with explanatory capabilities. We applied this system to a particular task of characterizing spatial configurations of blocks in a simple physical Blocks World (BW) domain using natural locative expressions, as well as generating justifications for the proposed spatial descriptions by indicating the factors that the system used to arrive at a particular conclusion.

pdf bib abs

Monotonic Inference for Underspecified Episodic Logic
Gene Kim | Mandar Juvekar | Lenhart Schubert
Proceedings of the 1st and 2nd Workshops on Natural Logic Meets Machine Learning (NALOMA)

We present a method of making natural logic inferences from Unscoped Logical Form of Episodic Logic. We establish a correspondence between inference rules of scope resolved Episodic Logic and the natural logic treatment by Sánchez Valencia (1991a), and hence demonstrate the ability to handle foundational natural logic inferences from prior literature as well as more general nested monotonicity inferences.

pdf bib abs

Learning General Event Schemas with Episodic Logic
Lane Lawley | Benjamin Kuehnert | Lenhart Schubert
Proceedings of the 1st and 2nd Workshops on Natural Logic Meets Machine Learning (NALOMA)

We present a system for learning generalized, stereotypical patterns of events—or “schemas”—from natural language stories, and applying them to make predictions about other stories. Our schemas are represented with Episodic Logic, a logical form that closely mirrors natural language. By beginning with a “head start” set of protoschemas— schemas that a 1- or 2-year-old child would likely know—we can obtain useful, general world knowledge with very few story examples—often only one or two. Learned schemas can be combined into more complex, composite schemas, and used to make predictions in other stories where only partial information is available.

2020

pdf bib abs

A Spoken Dialogue System for Spatial Question Answering in a Physical Blocks World
Georgiy Platonov | Lenhart Schubert | Benjamin Kane | Aaron Gindi
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

A physical blocks world, despite its relative simplicity, requires (in fully interactive form) a rich set of functional capabilities, ranging from vision to natural language understanding. In this work we tackle spatial question answering in a holistic way, using a vision system, speech input and output mediated by an animated avatar, a dialogue system that robustly interprets spatial queries, and a constraint solver that derives answers based on 3-D spatial modeling. The contributions of this work include a semantic parser that maps spatial questions into logical forms consistent with a general approach to meaning representation, a dialogue manager based on a schema representation, and a constraint solver for spatial questions that provides answers in agreement with human perception. These and other components are integrated into a multi-modal human-computer interaction pipeline.

2019

pdf bib abs

Towards Natural Language Story Understanding with Rich Logical Schemas
Lane Lawley | Gene Louis Kim | Lenhart Schubert
Proceedings of the Sixth Workshop on Natural Language and Computer Science

Generating “commonsense’’ knowledge for intelligent understanding and reasoning is a difficult, long-standing problem, whose scale challenges the capacity of any approach driven primarily by human input. Furthermore, approaches based on mining statistically repetitive patterns fail to produce the rich representations humans acquire, and fall far short of human efficiency in inducing knowledge from text. The idea of our approach to this problem is to provide a learning system with a “head start” consisting of a semantic parser, some basic ontological knowledge, and most importantly, a small set of very general schemas about the kinds of patterns of events (often purposive, causal, or socially conventional) that even a one- or two-year-old could reasonably be presumed to possess. We match these initial schemas to simple children’s stories, obtaining concrete instances, and combining and abstracting these into new candidate schemas. Both the initial and generated schemas are specified using a rich, expressive logical form. While modern approaches to schema reasoning often only use slot-and-filler structures, this logical form allows us to specify complex relations and constraints over the slots. Though formal, the representations are language-like, and as such readily relatable to NL text. The agents, objects, and other roles in the schemas are represented by typed variables, and the event variables can be related through partial temporal ordering and causal relations. To match natural language stories with existing schemas, we first parse the stories into an underspecified variant of the logical form used by the schemas, which is suitable for most concrete stories. We include a walkthrough of matching a children’s story to these schemas and generating inferences from these matches.

pdf bib abs

Unscoped episodic logical form (ULF) is a semantic representation capturing the predicate-argument structure of English within the episodic logic formalism in relation to the syntactic structure, while leaving scope, word sense, and anaphora unresolved. We describe how ULF can be used to generate natural language inferences that are grounded in the semantic and syntactic structure through a small set of rules defined over interpretable predicates and transformations on ULFs. The semantic restrictions placed by ULF semantic types enables us to ensure that the inferred structures are semantically coherent while the nearness to syntax enables accurate mapping to English. We demonstrate these inferences on four classes of conversationally-oriented inferences in a mixed genre dataset with 68.5% precision from human judgments.

pdf bib abs

A Type-coherent, Expressive Representation as an Initial Step to Language Understanding
Gene Louis Kim | Lenhart Schubert
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

A growing interest in tasks involving language understanding by the NLP community has led to the need for effective semantic parsing and inference. Modern NLP systems use semantic representations that do not quite fulfill the nuanced needs for language understanding: adequately modeling language semantics, enabling general inferences, and being accurately recoverable. This document describes underspecified logical forms (ULF) for Episodic Logic (EL), which is an initial form for a semantic representation that balances these needs. ULFs fully resolve the semantic type structure while leaving issues such as quantifier scope, word sense, and anaphora unresolved; they provide a starting point for further resolution into EL, and enable certain structural inferences without further resolution. This document also presents preliminary results of creating a hand-annotated corpus of ULFs for the purpose of training a precise ULF parser, showing a three-person pairwise interannotator agreement of 0.88 on confident annotations. We hypothesize that a divide-and-conquer approach to semantic parsing starting with derivation of ULFs will lead to semantic analyses that do justice to subtle aspects of linguistic meaning, and will enable construction of more accurate semantic parsers.

2018

pdf bib abs

Computational Models for Spatial Prepositions
Georgiy Platonov | Lenhart Schubert
Proceedings of the First International Workshop on Spatial Language Understanding

Developing computational models of spatial prepositions (such as on, in, above, etc.) is crucial for such tasks as human-machine collaboration, story understanding, and 3D model generation from descriptions. However, these prepositions are notoriously vague and ambiguous, with meanings depending on the types, shapes and sizes of entities in the argument positions, the physical and task context, and other factors. As a result truth value judgments for prepositional relations are often uncertain and variable. In this paper we treat the modeling task as calling for assignment of probabilities to such relations as a function of multiple factors, where such probabilities can be viewed as estimates of whether humans would judge the relations to hold in given circumstances. We implemented our models in a 3D blocks world and a room world in a computer graphics setting, and found that true/false judgments based on these models do not differ much more from human judgments that the latter differ from one another. However, what really matters pragmatically is not the accuracy of truth value judgments but whether, for instance, the computer models suffice for identifying objects described in terms of prepositional relations, (e.g., “the box to the left of the table”, where there are multiple boxes). For such tasks, our models achieved accuracies above 90% for most relations.

2017

pdf bib abs

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation
Gene Kim | Lenhart Schubert
Proceedings of the Workshop Computational Semantics Beyond Events and Roles

This paper describes current efforts in developing an annotation schema and guidelines for sentences in Episodic Logic (EL). We focus on important distinctions for representing modality, attitudes, and tense and present an annotation schema that makes these distinctions. EL has proved competitive with other logical formulations in speed and inference-enablement, while expressing a wider array of natural language phenomena including intensional modification of predicates and sentences, propositional attitudes, and tense and aspect.