pdf
bib
Proceedings of the 4th Natural Logic Meets Machine Learning Workshop
Stergios Chatzikyriakidis
|
Valeria de Paiva
pdf
bib
abs
Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning Ability and Human-like Biases
Risako Ando
|
Takanobu Morishita
|
Hirohiko Abe
|
Koji Mineshima
|
Mitsuhiro Okada
This paper investigates whether current large language models exhibit biases in logical reasoning, similar to humans. Specifically, we focus on syllogistic reasoning, a well-studied form of inference in the cognitive science of human deduction. To facilitate our analysis, we introduce a dataset called NeuBAROCO, originally designed for psychological experiments that assess human logical abilities in syllogistic reasoning. The dataset consists of syllogistic inferences in both English and Japanese. We examine three types of biases observed in human syllogistic reasoning: belief biases, conversion errors, and atmosphere effects. Our findings demonstrate that current large language models struggle more with problems involving these three types of biases.
pdf
bib
abs
SpaceNLI: Evaluating the Consistency of Predicting Inferences In Space
Lasha Abzianidze
|
Joost Zwarts
|
Yoad Winter
While many natural language inference (NLI) datasets target certain semantic phenomena, e.g., negation, tense & aspect, monotonicity, and presupposition, to the best of our knowledge, there is no NLI dataset that involves diverse types of spatial expressions and reasoning. We fill this gap by semi-automatically creating an NLI dataset for spatial reasoning, called SpaceNLI. The data samples are automatically generated from a curated set of reasoning patterns (see Figure 1), where the patterns are annotated with inference labels by experts. We test several SOTA NLI systems on SpaceNLI to gauge the complexity of the dataset and the system’s capacity for spatial reasoning. Moreover, we introduce a Pattern Accuracy and argue that it is a more reliable and stricter measure than the accuracy for evaluating a system’s performance on pattern-based generated data samples. Based on the evaluation results we find that the systems obtain moderate results on the spatial NLI problems but lack consistency per inference pattern. The results also reveal that non-projective spatial inferences (especially due to the “between” preposition) are the most challenging ones.
pdf
bib
abs
Does ChatGPT Resemble Humans in Processing Implicatures?
Zhuang Qiu
|
Xufeng Duan
|
Zhenguang Cai
Recent advances in large language models (LLMs) and LLM-driven chatbots, such as ChatGPT, have sparked interest in the extent to which these artificial systems possess human-like linguistic abilities. In this study, we assessed ChatGPT’s pragmatic capabilities by conducting three preregistered experiments focused on its ability to compute pragmatic implicatures. The first experiment tested whether ChatGPT inhibits the computation of generalized conversational implicatures (GCIs) when explicitly required to process the text’s truth-conditional meaning. The second and third experiments examined whether the communicative context affects ChatGPT’s ability to compute scalar implicatures (SIs). Our results showed that ChatGPT did not demonstrate human-like flexibility in switching between pragmatic and semantic processing. Additionally, ChatGPT’s judgments did not exhibit the well-established effect of communicative context on SI rates.
pdf
bib
abs
Recurrent Neural Network CCG Parser
Sora Tagami
|
Daisuke Bekki
The two contrasting approaches are end-to-end neural NLI systems and linguistically-oriented NLI pipelines consisting of modules such as neural CCG parsers and theorem provers. The latter, however, faces the challenge of integrating the neural models used in the syntactic and semantic components. RNNGs are frameworks that can potentially fill this gap, but conventional RNNGs adopt CFG as the syntactic theory. To address this issue, we implemented RNN-CCG, a syntactic parser that replaces CFG with CCG. We then conducted experiments comparing RNN-CCG to RNNGs with/without POS tags and evaluated their behavior as a first step towards building an NLI system based on RNN-CCG.
pdf
bib
abs
TTR at the SPA: Relating type-theoretical semantics to neural semantic pointers
Staffan Larsson
|
Robin Cooper
|
Jonathan Ginzburg
|
Andy Luecking
This paper considers how the kind of formal semantic objects used in TTR (a theory of types with records, Cooper 2013) might be related to the vector representations used in Eliasmith (2013). An advantage of doing this is that it would immediately give us a neural representation for TTR objects as Eliasmith relates vectors to neural activity in his semantic pointer architecture (SPA). This would be an alternative using convolution to the suggestions made by Cooper (2019) based on the phasing of neural activity. The project seems potentially hopeful since all complex TTR objects are constructed from labelled sets (essentially sets of ordered pairs consisting of labels and values) which might be seen as corresponding to the representation of structured objects which Eliasmith achieves using superposition and circular convolution.
pdf
bib
abs
Triadic temporal representations and deformations
Tim Fernando
Triadic representations that temporally order events and states are described, consisting of strings and sets of strings of bounded but refinable granularities. The strings are compressed according to J.A. Wheeler’s dictum it-from-bit, with bits given by statives and non-statives alike. A choice of vocabulary and constraints expressed in that vocabulary shape representations of cause-and-effect with deformations characteristic, Mumford posits, of patterns at various levels of cognitive processing. These deformations point to an ongoing process of learning, formulated as grammatical inference of finite automata, structured around Goguen and Burstall’s institutions.
pdf
bib
abs
Discourse Representation Structure Parsing for Chinese
Chunliu Wang
|
Xiao Zhang
|
Johan Bos
Previous work has predominantly focused on monolingual English semantic parsing. We, instead, explore the feasibility of Chinese semantic parsing in the absence of labeled data for Chinese meaning representations. We describe the pipeline of automatically collecting the linearized Chinese meaning representation data for sequential-to-sequential neural networks. We further propose a test suite designed explicitly for Chinese semantic parsing, which provides fine-grained evaluation for parsing performance, where we aim to study Chinese parsing difficulties. Our experimental results show that the difficulty of Chinese semantic parsing is mainly caused by adverbs. Realizing Chinese parsing through machine translation and an English parser yields slightly lower performance than training a model directly on Chinese data.