Enough Is Enough! a Case Study on the Effect of Data Size for Evaluation Using Universal Dependencies

Rob Van Der Goot; Zoey Liu; Max Müller-Eberstein

Enough Is Enough! a Case Study on the Effect of Data Size for Evaluation Using Universal Dependencies

Rob van der Goot, Zoey Liu, Max Müller-Eberstein

Abstract

When creating a new dataset for evaluation, one of the first considerations is the size of the dataset. If our evaluation data is too small, we risk making unsupported claims based on the results on such data. If, on the other hand, the data is too large, we waste valuable annotation time and costs that could have been used to widen the scope of our evaluation (i.e. annotate for more domains/languages). Hence, we investigate the effect of the size and a variety of sampling strategies of evaluation data to optimize annotation efforts, using dependency parsing as a test case. We show that for in-language in-domain datasets, 5,000 tokens is enough to obtain a reliable ranking of different parsers; especially if the data is distant enough from the training split (otherwise, we recommend 10,000). In cross-domain setups, the same amounts are required, but in cross-lingual setups much less (2,000 tokens) is enough.

Anthology ID:: 2024.lrec-main.544
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 6167–6176
Language:
URL:: https://aclanthology.org/2024.lrec-main.544/
DOI:
Bibkey:
Cite (ACL):: Rob van der Goot, Zoey Liu, and Max Müller-Eberstein. 2024. Enough Is Enough! a Case Study on the Effect of Data Size for Evaluation Using Universal Dependencies. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 6167–6176, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Enough Is Enough! a Case Study on the Effect of Data Size for Evaluation Using Universal Dependencies (van der Goot et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.544.pdf

PDF Cite Search Fix data