Analogy-making is central to human cognition, allowing us to adapt to novel situations – an ability that current AI systems still lack. Most analogy datasets today focus on simple analogies (e.g., word analogies); datasets including complex types of analogies are typically manually curated and very small. We believe that this holds back progress in computational analogy.In this work, we design a data generation pipeline, ParallelPARC (Parallel Paragraph Creator) leveraging state-of-the-art Large Language Models (LLMs) to create complex, paragraph-based analogies, as well as distractors, both simple and challenging. We demonstrate our pipeline and create ProPara-Logy, a dataset of analogies between scientific processes. We publish a gold-set, validated by humans, and a silver-set, generated automatically. We test LLMs’ and humans’ analogy recognition in binary and multiple-choice settings, and found that humans outperform the best models (∼13% gap) after a light supervision. We demonstrate that our silver-set is useful for training models. Lastly, we show challenging distractors confuse LLMs, but not humans. We hope our pipeline will encourage research in this emerging field.
Analogy-making gives rise to reasoning, abstraction, flexible categorization and counterfactual inference – abilities lacking in even the best AI systems today. Much research has suggested that analogies are key to non-brittle systems that can adapt to new domains. Despite their importance, analogies received little attention in the NLP community, with most research focusing on simple word analogies. Work that tackled more complex analogies relied heavily on manually constructed, hard-to-scale input representations.In this work, we explore a more realistic, challenging setup: our input is a pair of natural language procedural texts, describing a situation or a process (e.g., how the heart works/how a pump works). Our goal is to automatically extract entities and their relations from the text and find a mapping between the different domains based on relational similarity (e.g., blood is mapped to water). We develop an interpretable, scalable algorithm and demonstrate that it identifies the correct mappings 87% of the time for procedural texts and 94% for stories from cognitive-psychology literature. We show it can extract analogies from a large dataset of procedural texts, achieving 79% precision (analogy prevalence in data: 3%). Lastly, we demonstrate that our algorithm is robust to paraphrasing the input texts
Can we teach models designed for language understanding tasks to track and improve their beliefs through intermediate points in text? Besides making their inner workings more transparent, this would also help make models more reliable and consistent. To this end, we propose a representation learning framework called breakpoint modeling that allows for efficient and robust learning of this type. Given any text encoder and data marked with intermediate states (breakpoints) along with corresponding textual queries viewed as true/false propositions (i.e., the candidate intermediate beliefs of a model), our approach trains models in an efficient and end-to-end fashion to build intermediate representations that facilitate direct querying and training of beliefs at arbitrary points in text, alongside solving other end-tasks. We evaluate breakpoint modeling on a diverse set of NLU tasks including relation reasoning on Cluttr and narrative understanding on bAbI. Using novel proposition prediction tasks alongside these end-tasks, we show the benefit of our T5-based breakpoint transformer over strong conventional representation learning approaches in terms of processing efficiency, belief accuracy, and belief consistency, all with minimal to no degradation on the end-task. To show the feasibility of incorporating our belief tracker into more complex reasoning pipelines, we also obtain state-of-the-art performance on the three-tiered reasoning challenge for the recent TRIP benchmark (23-32% absolute improvement on Tasks 2-3).