PUER: Boosting Few-shot Positive-Unlabeled Entity Resolution with Reinforcement Learning

Yaoshu Wang; Mengyi Yan; Wei Wang

doi:10.18653/v1/2025.findings-emnlp.1336

PUER: Boosting Few-shot Positive-Unlabeled Entity Resolution with Reinforcement Learning

Abstract

Entity resolution is a fundamental problem in data management that aims to identify all duplicate entries within collections of multi-attribute tuples. Most existing works focus on supervised learning, relying on large amounts of high-quality labeled data, including both positive and negative tuple pairs that are meticulously prepared. However, in reality, the manual annotation process is labor-intensive; in particular, selecting high-quality negative data for labeling is both important and challenging. In this paper, we propose an end-to-end ER solution, PUER, to address low-resource entity resolution (ER) by leveraging Large Language Models (LLMs) in a Positive-Unlabeled (PU) learning setting, where only a small number of positively labeled examples, e.g., 50, and unlabeled data are provided. Unlike directly fine-tuning LLMs in a supervised manner, we solve the entity matching task using reinforcement learning and propose a self-adaptive reward function in the process of RL. To enhance performance, we design an iterative workflow based on the co-training mechanism that fully utilizes entity blocking component to assist the entity matching. This workflow aims to improve the robustness and quality of pseudo-labels so that the performance of entity matching improves. Comprehensive experimental results on various benchmark datasets demonstrate the superiority of PUER. Full version and code are available.

Anthology ID:: 2025.findings-emnlp.1336
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24567–24579
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.1336/
DOI:: 10.18653/v1/2025.findings-emnlp.1336
Bibkey:
Cite (ACL):: Yaoshu Wang, Mengyi Yan, and Wei Wang. 2025. PUER: Boosting Few-shot Positive-Unlabeled Entity Resolution with Reinforcement Learning. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 24567–24579, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: PUER: Boosting Few-shot Positive-Unlabeled Entity Resolution with Reinforcement Learning (Wang et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.1336.pdf
Checklist:: 2025.findings-emnlp.1336.checklist.pdf

PDF Cite Search Checklist Fix data