TMATH A Dataset for Evaluating Large Language Models in Generating Educational Hints for Math Word Problems

Changyong Qi; Yuang Wei; Haoxin Xu; Longwei Zheng; Peiji Chen; Xiaoqing Gu

TMATH A Dataset for Evaluating Large Language Models in Generating Educational Hints for Math Word Problems

Changyong Qi, Yuang Wei, Haoxin Xu, Longwei Zheng, Peiji Chen, Xiaoqing Gu

Abstract

Large Language Models (LLMs) are increasingly being applied in education, showing significant potential in personalized instruction, student feedback, and intelligent tutoring. Generating hints for Math Word Problems (MWPs) has become a critical application, particularly in helping students understand problem-solving steps and logic. However, existing models struggle to provide pedagogically sound guidance that fosters learning without offering direct answers. To address this issue, we introduce TMATH, a dataset specifically designed to evaluate LLMs’ ability to generate high-quality hints for MWPs. TMATH contains diverse mathematical problems paired with carefully crafted, human-generated hints. To assess its impact, we fine-tuned a series of 7B-scale language models using TMATH. Our results, based on quantitative evaluations and expert assessments, show that while LLMs still face challenges in complex reasoning, the TMATH dataset significantly enhances their ability to generate more accurate and contextually appropriate educational hints.

Anthology ID:: 2025.coling-main.340
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5082–5093
Language:
URL:: https://aclanthology.org/2025.coling-main.340/
DOI:
Bibkey:
Cite (ACL):: Changyong Qi, Yuang Wei, Haoxin Xu, Longwei Zheng, Peiji Chen, and Xiaoqing Gu. 2025. TMATH A Dataset for Evaluating Large Language Models in Generating Educational Hints for Math Word Problems. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5082–5093, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: TMATH A Dataset for Evaluating Large Language Models in Generating Educational Hints for Math Word Problems (Qi et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.340.pdf

PDF Cite Search Fix data