Xuefei Cao


2023

pdf bib
We Need to Talk About Reproducibility in NLP Model Comparison
Yan Xue | Xuefei Cao | Xingli Yang | Yu Wang | Ruibo Wang | Jihong Li
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

NLPers frequently face reproducibility crisis in a comparison of various models of a real-world NLP task. Many studies have empirically showed that the standard splits tend to produce low reproducible and unreliable conclusions, and they attempted to improve the splits by using more random repetitions. However, the improvement on the reproducibility in a comparison of NLP models is limited attributed to a lack of investigation on the relationship between the reproducibility and the estimator induced by a splitting strategy. In this paper, we formulate the reproducibility in a model comparison into a probabilistic function with regard to a conclusion. Furthermore, we theoretically illustrate that the reproducibility is qualitatively dominated by the signal-to-noise ratio (SNR) of a model performance estimator obtained on a corpus splitting strategy. Specifically, a higher value of the SNR of an estimator probably indicates a better reproducibility. On the basis of the theoretical motivations, we develop a novel mixture estimator of the performance of an NLP model with a regularized corpus splitting strategy based on a blocked 3× 2 cross-validation. We conduct numerical experiments on multiple NLP tasks to show that the proposed estimator achieves a high SNR, and it substantially increases the reproducibility. Therefore, we recommend the NLP practitioners to use the proposed method to compare NLP models instead of the methods based on the widely-used standard splits and the random splits with multiple repetitions.

pdf bib
基于BiLSTM聚合模型的汉语框架语义角色识别(Chinese Frame Semantic Role Identification Based on BiLSTM Aggregation Model)
Xuefei Cao (曹学飞) | Hongji Li (李济洪) | Ruibo Wang (王瑞波) | Qian Niu (牛倩)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“目前,基于神经网络的汉语框架语义角色识别模型的性能依然较低,考虑到神经网络模型的性能受到超参数的影响,本文将超参数调优和模型预测性能的提升统一到基于BiLSTM的聚合模型框架下解决。使用正则化交叉验证进行实验,通过正则化条件约束训练集和验证集的分布差异,避免分布不一致带来的性能波动。将交叉验证得到的结果进行众数投票,以投票后的结果对不同的超参数配置进行评估,并选择若干种没有显著差异的超参数配置构成最优的超参数配置集合。然后将最优的超参数配置集合对应的子模型进行聚合,构造汉语框架语义角色识别的聚合模型。实验结果显示,本文方法的性能较基准模型显著提升了9.56%。”