Spyridon Mouselinos
2024
Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models
Spyridon Mouselinos
|
Henryk Michalewski
|
Mateusz Malinowski
Findings of the Association for Computational Linguistics: EMNLP 2024
Large Language Models (LLMs) demonstrate ever-increasing abilities in mathematical and algorithmic tasks, yet their geometric reasoning skills are underexplored. We investigate LLMs’ abilities in constructive geometric problem-solving, – one of the most fundamental steps in developing human mathematical reasoning, revealing notable challenges in this domain. LLMs exhibit biases in variable names, struggle with 2D spatial relationships and planning, and hallucinate object placements. To this end, we introduce a framework that enhances LLMs’ reasoning potential through a multi-agent system conducting internal dialogue. This work underscores LLMs’ limitations in geometric reasoning and improves their capabilities through self-correction, collaboration, and diverse role specializations.
2023
A Simple, Yet Effective Approach to Finding Biases in Code Generation
Spyridon Mouselinos
|
Mateusz Malinowski
|
Henryk Michalewski
Findings of the Association for Computational Linguistics: ACL 2023
Recently, high-performing code generation systems based on large language models have surfaced. They are trained on massive corpora containing much more natural text than actual executable computer code. This work shows that current code generation systems exhibit undesired biases inherited from their large language model backbones, which can reduce the quality of the generated code under specific circumstances. To investigate the effect, we propose the “block of influence” concept, which enables a modular decomposition and analysis of the coding challenges. We introduce an automated intervention mechanism reminiscent of adversarial testing that exposes undesired biases through the failure modes of the models under test. Finally, we demonstrate how our framework can be used as a data transformation technique during fine-tuning, acting as a mitigation strategy for these biases.