2023
pdf
bib
abs
Data Selection for Fine-tuning Large Language Models Using Transferred Shapley Values
Stephanie Schoch
|
Ritwick Mishra
|
Yangfeng Ji
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Although Shapley values have been shown to be highly effective for identifying harmful training instances, dataset size and model complexity constraints limit the ability to apply Shapley-based data valuation to fine-tuning large pre-trained language models. To address this, we propose TS-DShapley, an algorithm that reduces computational cost of Shapley-based data valuation through: 1) an efficient sampling-based method that aggregates Shapley values computed from subsets for valuation of the entire training set, and 2) a value transfer method that leverages value information extracted from a simple classifier trained using representations from the target language model. Our experiments applying TS-DShapley to select data for fine-tuning BERT-based language models on benchmark natural language understanding (NLU) datasets show that TS-DShapley outperforms existing data selection methods. Further, TS-DShapley can filter fine-tuning data to increase language model performance compared to training with the full fine-tuning dataset.
pdf
bib
abs
Barriers and enabling factors for error analysis in NLG research
Emiel van Miltenburg
|
Miruna Clinciu
|
Ondřej Dušek
|
Dimitra Gkatzia
|
Stephanie Inglis
|
Leo Leppänen
|
Saad Mahamood
|
Stephanie Schoch
|
Craig Thomson
|
Luou Wen
Northern European Journal of Language Technology, Volume 9
Earlier research has shown that few studies in Natural Language Generation (NLG) evaluate their system outputs using an error analysis, despite known limitations of automatic evaluation metrics and human ratings. This position paper takes the stance that error analyses should be encouraged, and discusses several ways to do so. This paper is based on our shared experience as authors as well as a survey we distributed as a means of public consultation. We provide an overview of existing barriers to carrying out error analyses, and propose changes to improve error reporting in the NLG literature.
2021
pdf
bib
abs
Underreporting of errors in NLG output, and what to do about it
Emiel van Miltenburg
|
Miruna Clinciu
|
Ondřej Dušek
|
Dimitra Gkatzia
|
Stephanie Inglis
|
Leo Leppänen
|
Saad Mahamood
|
Emma Manning
|
Stephanie Schoch
|
Craig Thomson
|
Luou Wen
Proceedings of the 14th International Conference on Natural Language Generation
We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make. This is a problem, because mistakes are an important indicator of where systems should still be improved. If authors only report overall performance metrics, the research community is left in the dark about the specific weaknesses that are exhibited by ‘state-of-the-art’ research. Next to quantifying the extent of error under-reporting, this position paper provides recommendations for error identification, analysis and reporting.
pdf
bib
abs
Contextualizing Variation in Text Style Transfer Datasets
Stephanie Schoch
|
Wanyu Du
|
Yangfeng Ji
Proceedings of the 14th International Conference on Natural Language Generation
Text style transfer involves rewriting the content of a source sentence in a target style. Despite there being a number of style tasks with available data, there has been limited systematic discussion of how text style datasets relate to each other. This understanding, however, is likely to have implications for selecting multiple data sources for model training. While it is prudent to consider inherent stylistic properties when determining these relationships, we also must consider how a style is realized in a particular dataset. In this paper, we conduct several empirical analyses of existing text style datasets. Based on our results, we propose a categorization of stylistic and dataset properties to consider when utilizing or comparing text style datasets.
2020
pdf
bib
abs
“This is a Problem, Don’t You Agree?” Framing and Bias in Human Evaluation for Natural Language Generation
Stephanie Schoch
|
Diyi Yang
|
Yangfeng Ji
Proceedings of the 1st Workshop on Evaluating NLG Evaluation
Despite recent efforts reviewing current human evaluation practices for natural language generation (NLG) research, the lack of reported question wording and potential for framing effects or cognitive biases influencing results has been widely overlooked. In this opinion paper, we detail three possible framing effects and cognitive biases that could be imposed on human evaluation in NLG. Based on this, we make a call for increased transparency for human evaluation in NLG and propose the concept of human evaluation statements. We make several recommendations for design details to report that could potentially influence results, such as question wording, and suggest that reporting pertinent design details can help increase comparability across studies as well as reproducibility of results.