Pengshuai Li


2021

pdf bib
AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization
Xinsong Zhang | Pengshuai Li | Hang Li
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
Active Testing: An Unbiased Evaluation Method for Distantly Supervised Relation Extraction
Pengshuai Li | Xinsong Zhang | Weijia Jia | Wei Zhao
Findings of the Association for Computational Linguistics: EMNLP 2020

Distant supervision has been a widely used method for neural relation extraction for its convenience of automatically labeling datasets. However, existing works on distantly supervised relation extraction suffer from the low quality of test set, which leads to considerable biased performance evaluation. These biases not only result in unfair evaluations but also mislead the optimization of neural relation extraction. To mitigate this problem, we propose a novel evaluation method named active testing through utilizing both the noisy test set and a few manual annotations. Experiments on a widely used benchmark show that our proposed approach can yield approximately unbiased evaluations for distantly supervised relation extractors.

2019

pdf bib
GAN Driven Semi-distant Supervision for Relation Extraction
Pengshuai Li | Xinsong Zhang | Weijia Jia | Hai Zhao
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Distant supervision has been widely used in relation extraction tasks without hand-labeled datasets recently. However, the automatically constructed datasets comprise numbers of wrongly labeled negative instances due to the incompleteness of knowledge bases, which is neglected by current distant supervised methods resulting in seriously misleading in both training and testing processes. To address this issue, we propose a novel semi-distant supervision approach for relation extraction by constructing a small accurate dataset and properly leveraging numerous instances without relation labels. In our approach, we construct accurate instances by both knowledge base and entity descriptions determined to avoid wrong negative labeling and further utilize unlabeled instances sufficiently using generative adversarial network (GAN) framework. Experimental results on real-world datasets show that our approach can achieve significant improvements in distant supervised relation extraction over strong baselines.