Beyond the Hype: Identifying and Analyzing Math Word Problem-Solving Challenges for Large Language Models

Romina Soledad Albornoz-De Luise, David Arnau, Pablo Arnau-González, Miguel Arevalillo-Herráez


Abstract
Despite not being explicitly trained for this purpose, models like Mistral and LLaMA have demonstrated impressive results across numerous tasks, including generating solutions to Mathematical Word Problems (MWPs). A MWP involves translating a textual description into a mathematical model or equation that solving it. However, these models face challenges in accurately interpreting and utilizing the numerical information present in the MWP statements, which can lead to errors in the generated solutions. To better understand the limitations of LLMs, we analyzed the MWP where models failed to accurately solve problems from the SVAMP dataset. By categorizing these MWPs, we identify specific types of problems where the models are most prone to errors, providing insights into the underlying challenges faced by LLMs in problem-solving scenarios and open new modeling opportunities. By understanding the expected errors, researchers can design strategies to adequately model problems more effectively and choose the most suitable LLM for solving them taking into account each model’s strengths and weaknesses.
Anthology ID:
2024.practicald2t-1.1
Volume:
Proceedings of the 2nd Workshop on Practical LLM-assisted Data-to-Text Generation
Month:
September
Year:
2024
Address:
Tokyo, Japan
Editors:
Simone Balloccu, Zdeněk Kasner, Ondřej Plátek, Patrícia Schmidtová, Kristýna Onderková, Mateusz Lango, Ondřej Dušek, Lucie Flek, Ehud Reiter, Dimitra Gkatzia, Simon Mille
Venues:
PracticalD2T | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–6
Language:
URL:
https://aclanthology.org/2024.practicald2t-1.1
DOI:
Bibkey:
Cite (ACL):
Romina Soledad Albornoz-De Luise, David Arnau, Pablo Arnau-González, and Miguel Arevalillo-Herráez. 2024. Beyond the Hype: Identifying and Analyzing Math Word Problem-Solving Challenges for Large Language Models. In Proceedings of the 2nd Workshop on Practical LLM-assisted Data-to-Text Generation, pages 1–6, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):
Beyond the Hype: Identifying and Analyzing Math Word Problem-Solving Challenges for Large Language Models (Albornoz-De Luise et al., PracticalD2T-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.practicald2t-1.1.pdf