Qiao Qiao
2025
Re-Examine Distantly Supervised NER: A New Benchmark and a Simple Approach
Yuepei Li
|
Kang Zhou
|
Qiao Qiao
|
Qing Wang
|
Qi Li
Proceedings of the 31st International Conference on Computational Linguistics
Distantly-Supervised Named Entity Recognition (DS-NER) uses knowledge bases or dictionaries for annotations, reducing manual efforts but rely on large human labeled validation set. In this paper, we introduce a real-life DS-NER dataset, QTL, where the training data is annotated using domain dictionaries and the test data is annotated by domain experts. This dataset has a small validation set, reflecting real-life scenarios. Existing DS-NER approaches fail when applied to QTL, which motivate us to re-examine existing DS-NER approaches. We found that many of them rely on large validation sets and some used test set for tuning inappropriately. To solve this issue, we proposed a new approach, token-level Curriculum-based Positive-Unlabeled Learning (CuPUL), which uses curriculum learning to order training samples from easy to hard. This method stabilizes training, making it robust and effective on small validation sets. CuPUL also addresses false negative issues using the Positive-Unlabeled learning paradigm, demonstrating improved performance in real-life applications.
2024
GenDecider: Integrating “None of the Candidates” Judgments in Zero-Shot Entity Linking Re-ranking
Kang Zhou
|
Yuepei Li
|
Qing Wang
|
Qiao Qiao
|
Qi Li
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
We introduce GenDecider, a novel re-ranking approach for Zero-Shot Entity Linking (ZSEL), built on the Llama model. It innovatively detects scenarios where the correct entity is not among the retrieved candidates, a common oversight in existing re-ranking methods. By autoregressively generating outputs based on the context of the entity mention and the candidate entities, GenDecider significantly enhances disambiguation, improving the accuracy and reliability of ZSEL systems, as demonstrated on the benchmark ZESHEL dataset. Our code is available at https://github.com/kangISU/GenDecider.
2023
Improving Unsupervised Relation Extraction by Augmenting Diverse Sentence Pairs
Qing Wang
|
Kang Zhou
|
Qiao Qiao
|
Yuepei Li
|
Qi Li
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Unsupervised relation extraction (URE) aims to extract relations between named entities from raw text without requiring manual annotations or pre-existing knowledge bases. In recent studies of URE, researchers put a notable emphasis on contrastive learning strategies for acquiring relation representations. However, these studies often overlook two important aspects: the inclusion of diverse positive pairs for contrastive learning and the exploration of appropriate loss functions. In this paper, we propose AugURE with both within-sentence pairs augmentation and augmentation through cross-sentence pairs extraction to increase the diversity of positive pairs and strengthen the discriminative power of contrastive learning. We also identify the limitation of noise-contrastive estimation (NCE) loss for relation representation learning and propose to apply margin loss for sentence pairs. Experiments on NYT-FB and TACRED datasets demonstrate that the proposed relation representation learning and a simple K-Means clustering achieves state-of-the-art performance.