Meiguo Wang
2023
UseClean: learning from complex noisy labels in named entity recognition
Jinjin Tian
|
Kun Zhou
|
Meiguo Wang
|
Yu Zhang
|
Benjamin Yao
|
Xiaohu Liu
|
Chenlei Guo
Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)
We investigate and refine denoising methods for NER task on data that potentially contains extremely noisy labels from multi-sources. In this paper, we first summarized all possible noise types and noise generation schemes, based on which we built a thorough evaluation system. We then pinpoint the bottleneck of current state-of-art denoising methods using our evaluation system. Correspondingly, we propose several refinements, including using a two-stage framework to avoid error accumulation; a novel confidence score utilizing minimal clean supervision to increase predictive power; an automatic cutoff fitting to save extensive hyper-parameter tuning; a warm started weighted partial CRF to better learn on the noisy tokens. Additionally, we propose to use adaptive sampling to further boost the performance in long-tailed entity settings. Our method improves F1 score by on average at least 5 10% over current state-of-art across extensive experiments.
2022
Joint Goal Segmentation and Goal Success Prediction on Multi-Domain Conversations
Meiguo Wang
|
Benjamin Yao
|
Bin Guo
|
Xiaohu Liu
|
Yu Zhang
|
Tuan-Hung Pham
|
Chenlei Guo
Proceedings of the 29th International Conference on Computational Linguistics
To evaluate the performance of a multi-domain goal-oriented Dialogue System (DS), it is important to understand what the users’ goals are for the conversations and whether those goals are successfully achieved. The success rate of goals directly correlates with user satisfaction and perceived usefulness of the DS. In this paper, we propose a novel automatic dialogue evaluation framework that jointly performs two tasks: goal segmentation and goal success prediction. We extend the RoBERTa-IQ model (Gupta et al., 2021) by adding multi-task learning heads for goal segmentation and success prediction. Using an annotated dataset from a commercial DS, we demonstrate that our proposed model reaches an accuracy that is on-par with single-pass human annotation comparing to a three-pass gold annotation benchmark.