Toshiki Kuramoto
2025
Predicting Fine-tuned Performance on Larger Datasets Before Creating Them
Toshiki Kuramoto
|
Jun Suzuki
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
This paper proposes a method to estimate the performance of pretrained models fine-tuned with a larger dataset from the result with a smaller dataset. Specifically, we demonstrate that when a pretrained model is fine-tuned, its classification performance increases at the same overall rate, regardless of the original dataset size, as the number of epochs increases. Subsequently, we verify that an approximate formula based on this trend can be used to predict the performance when the model is trained with ten times or more training data, even when the initial training dataset is limited. Our results show that this approach can help resource-limited companies develop machine-learning models.