Development of Numerical Error Detection Tasks to Analyze the Numerical Capabilities of Language Models

Taku Sakamoto; Saku Sugawara; Akiko Aizawa

Development of Numerical Error Detection Tasks to Analyze the Numerical Capabilities of Language Models

Taku Sakamoto, Saku Sugawara, Akiko Aizawa

Abstract

Numbers are used to describe quantities in various scenarios in daily life; therefore, numerical errors can significantly affect the meaning of the entire sentence, and even a single-letter error can be fatal. Detecting numerical errors often requires a high level of commonsense and is difficult even with the recent large language models (LLMs). In this study, we create a benchmark dataset of numerical error detection that uses automatically generated numerical errors. In our analysis, we classify the numerical errors based on the properties of the errors and investigate the ability of the model from several perspectives, including the error class, error size, and passage domain. The experimental results indicate that GPT-3.5, GPT-4, and Llama-3-Instruct (8B) perform well in the numerical error detection task; however, they are not as accurate as humans. We find that the LLMs misidentified correct numbers as errors more frequently than the humans did. In particular, the analysis demonstrates that the current LLMs still need improvement for detecting numerical errors requiring calculations or extensive prior knowledge.

Anthology ID:: 2025.coling-main.666
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9957–9976
Language:
URL:: https://aclanthology.org/2025.coling-main.666/
DOI:
Bibkey:
Cite (ACL):: Taku Sakamoto, Saku Sugawara, and Akiko Aizawa. 2025. Development of Numerical Error Detection Tasks to Analyze the Numerical Capabilities of Language Models. In Proceedings of the 31st International Conference on Computational Linguistics, pages 9957–9976, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Development of Numerical Error Detection Tasks to Analyze the Numerical Capabilities of Language Models (Sakamoto et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.666.pdf

PDF Cite Search Fix data