CTYUN-AI at SemEval-2024 Task 7: Boosting Numerical Understanding with Limited Data Through Effective Data Alignment

Yuming Fan; Dongming Yang; Xu He

doi:10.18653/v1/2024.semeval-1.8

CTYUN-AI at SemEval-2024 Task 7: Boosting Numerical Understanding with Limited Data Through Effective Data Alignment

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities in pushing the boundaries of natural language understanding. Nevertheless, the majority of existing open-source LLMs still fall short of meeting satisfactory standards when it comes to addressing numerical problems, especially as the enhancement of their numerical capabilities heavily relies on extensive data.To bridge the gap, we aim to improve the numerical understanding of LLMs by means of efficient data alignment, utilizing only a limited amount of necessary data.Specifically, we first use a data discovery strategy to obtain the most effective portion of numerical data from large datasets. Then, self-augmentation is performed to maximize the potential of the training data. Thirdly, answers of all traning samples are aligned based on some simple rules. Finally, our method achieves the first place in the competition, offering new insights and methodologies for numerical understanding research in LLMs.

Anthology ID:: 2024.semeval-1.8
Volume:: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 47–52
Language:
URL:: https://aclanthology.org/2024.semeval-1.8/
DOI:: 10.18653/v1/2024.semeval-1.8
Bibkey:
Cite (ACL):: Yuming Fan, Dongming Yang, and Xu He. 2024. CTYUN-AI at SemEval-2024 Task 7: Boosting Numerical Understanding with Limited Data Through Effective Data Alignment. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 47–52, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: CTYUN-AI at SemEval-2024 Task 7: Boosting Numerical Understanding with Limited Data Through Effective Data Alignment (Fan et al., SemEval 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.semeval-1.8.pdf
Supplementarymaterial:: 2024.semeval-1.8.SupplementaryMaterial.txt

PDF Cite Search Supplementarymaterial Fix data