PUER: Boosting Few-shot Positive-Unlabeled Entity Resolution with Reinforcement Learning

Yaoshu Wang, Mengyi Yan, Wei Wang


Abstract
Entity resolution is a fundamental problem in data management that aims to identify all duplicate entries within collections of multi-attribute tuples. Most existing works focus on supervised learning, relying on large amounts of high-quality labeled data, including both positive and negative tuple pairs that are meticulously prepared. However, in reality, the manual annotation process is labor-intensive; in particular, selecting high-quality negative data for labeling is both important and challenging. In this paper, we propose an end-to-end ER solution, PUER, to address low-resource entity resolution (ER) by leveraging Large Language Models (LLMs) in a Positive-Unlabeled (PU) learning setting, where only a small number of positively labeled examples, e.g., 50, and unlabeled data are provided. Unlike directly fine-tuning LLMs in a supervised manner, we solve the entity matching task using reinforcement learning and propose a self-adaptive reward function in the process of RL. To enhance performance, we design an iterative workflow based on the co-training mechanism that fully utilizes entity blocking component to assist the entity matching. This workflow aims to improve the robustness and quality of pseudo-labels so that the performance of entity matching improves. Comprehensive experimental results on various benchmark datasets demonstrate the superiority of PUER. Full version and code are available.
Anthology ID:
2025.findings-emnlp.1336
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24567–24579
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.1336/
DOI:
Bibkey:
Cite (ACL):
Yaoshu Wang, Mengyi Yan, and Wei Wang. 2025. PUER: Boosting Few-shot Positive-Unlabeled Entity Resolution with Reinforcement Learning. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 24567–24579, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
PUER: Boosting Few-shot Positive-Unlabeled Entity Resolution with Reinforcement Learning (Wang et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.1336.pdf
Checklist:
 2025.findings-emnlp.1336.checklist.pdf