Ronen Tamari


2022

pdf bib
Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking
Ronen Tamari | Kyle Richardson | Noam Kahlon | Aviad Sar-shalom | Nelson F. Liu | Reut Tsarfaty | Dafna Shahaf
Proceedings of the 11th Joint Conference on Lexical and Computational Semantics

While neural language models often perform surprisingly well on natural language understanding (NLU) tasks, their strengths and limitations remain poorly understood. Controlled synthetic tasks are thus an increasingly important resource for diagnosing model behavior. In this work we focus on story understanding, a core competency for NLU systems. However, the main synthetic resource for story understanding, the bAbI benchmark, lacks such a systematic mechanism for controllable task generation. We develop Dyna-bAbI, a dynamic framework providing fine-grained control over task generation in bAbI. We demonstrate our ideas by constructing three new tasks requiring compositional generalization, an important evaluation setting absent from the original benchmark. We tested both special-purpose models developed for bAbI as well as state-of-the-art pre-trained methods, and found that while both approaches solve the original tasks (99{% accuracy), neither approach succeeded in the compositional generalization setting, indicating the limitations of the original training data.We explored ways to augment the original data, and found that though diversifying training data was far more useful than simply increasing dataset size, it was still insufficient for driving robust compositional generalization (with 70{% accuracy for complex compositions). Our results underscore the importance of highly controllable task generators for creating robust NLU systems through a virtuous cycle of model and data development.

2021

pdf bib
Process-Level Representation of Scientific Protocols with Interactive Annotation
Ronen Tamari | Fan Bai | Alan Ritter | Gabriel Stanovsky
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We develop Process Execution Graphs (PEG), a document-level representation of real-world wet lab biochemistry protocols, addressing challenges such as cross-sentence relations, long-range coreference, grounding, and implicit arguments. We manually annotate PEGs in a corpus of complex lab protocols with a novel interactive textual simulator that keeps track of entity traits and semantic constraints during annotation. We use this data to develop graph-prediction models, finding them to be good at entity identification and local relation extraction, while our corpus facilitates further exploration of challenging long-range relations.

2020

pdf bib
Language (Re)modelling: Towards Embodied Language Understanding
Ronen Tamari | Chen Shani | Tom Hope | Miriam R L Petruck | Omri Abend | Dafna Shahaf
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

While natural language understanding (NLU) is advancing rapidly, today’s technology differs from human-like language understanding in fundamental ways, notably in its inferior efficiency, interpretability, and generalization. This work proposes an approach to representation and learning based on the tenets of embodied cognitive linguistics (ECL). According to ECL, natural language is inherently executable (like programming languages), driven by mental simulation and metaphoric mappings over hierarchical compositions of structures and schemata learned through embodied interaction. This position paper argues that the use of grounding by metaphoric reasoning and simulation will greatly benefit NLU systems, and proposes a system architecture along with a roadmap towards realizing this vision.

2019

pdf bib
Y’all should read this! Identifying Plurality in Second-Person Personal Pronouns in English Texts
Gabriel Stanovsky | Ronen Tamari
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

Distinguishing between singular and plural “you” in English is a challenging task which has potential for downstream applications, such as machine translation or coreference resolution. While formal written English does not distinguish between these cases, other languages (such as Spanish), as well as other dialects of English (via phrases such as “y’all”), do make this distinction. We make use of this to obtain distantly-supervised labels for the task on a large-scale in two domains. Following, we train a model to distinguish between the single/plural ‘you’, finding that although in-domain training achieves reasonable accuracy (≥ 77%), there is still a lot of room for improvement, especially in the domain-transfer scenario, which proves extremely challenging. Our code and data are publicly available.

pdf bib
Playing by the Book: An Interactive Game Approach for Action Graph Extraction from Text
Ronen Tamari | Hiroyuki Shindo | Dafna Shahaf | Yuji Matsumoto
Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications

Understanding procedural text requires tracking entities, actions and effects as the narrative unfolds. We focus on the challenging real-world problem of action-graph extraction from materials science papers, where language is highly specialized and data annotation is expensive and scarce. We propose a novel approach, Text2Quest, where procedural text is interpreted as instructions for an interactive game. A learning agent completes the game by executing the procedure correctly in a text-based simulated lab environment. The framework can complement existing approaches and enables richer forms of learning compared to static texts. We discuss potential limitations and advantages of the approach, and release a prototype proof-of-concept, hoping to encourage research in this direction.