ZXQ at SemEval-2024 Task 7: Fine-tuning GPT-3.5-Turbo for Numerical Reasoning

Zhen Qian, Xiaofei Xu, Xiuzhen Zhang


Abstract
In this paper, we present our system for the SemEval-2024 Task 7, i.e., NumEval subtask 3: Numericial Reasoning. Given a news article and its headline, the numerical reasoning task involves creating a system to compute the intentionally excluded number within the news headline. We propose a fine-tuned GPT-3.5-turbo model, specifically engineered to deduce missing numerals directly from the content of news article. The model is trained with a human-engineered prompt that itegrates the news content and the masked headline, tailoring its accuracy for the designated task. It achieves an accuracy of 0.94 on the test data and secures the second position in the official leaderboard. An examination on the system’s inference results reveals its commendable accuracy in identifying correct numerals when they can be directly “copied” from the articles. However, the error rates increase when it comes to some ambiguous operations such as rounding.
Anthology ID:
2024.semeval-1.34
Volume:
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
218–223
Language:
URL:
https://aclanthology.org/2024.semeval-1.34
DOI:
10.18653/v1/2024.semeval-1.34
Bibkey:
Cite (ACL):
Zhen Qian, Xiaofei Xu, and Xiuzhen Zhang. 2024. ZXQ at SemEval-2024 Task 7: Fine-tuning GPT-3.5-Turbo for Numerical Reasoning. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 218–223, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
ZXQ at SemEval-2024 Task 7: Fine-tuning GPT-3.5-Turbo for Numerical Reasoning (Qian et al., SemEval 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.semeval-1.34.pdf
Supplementary material:
 2024.semeval-1.34.SupplementaryMaterial.txt