DuRE: Dual Contrastive Self Training for Semi-Supervised Relation Extraction

Yuxi Feng, Laks Lakshmanan


Abstract
Document-level Relation Extraction (RE) aims to extract relation triples from documents. Existing document-RE models typically rely on supervised learning which requires substantial labeled data. To alleviate the amount of human supervision, Self-training (ST) has prospered again in language understanding by augmenting the fine-tuning of big pre-trained models whenever labeled data is insufficient. However, existing ST methods in RE fail to tackle the challenge of long-tail relations. In this work, we propose DuRE, a novel ST framework to tackle these problems. DuRE jointly models RE classification and text generation as a dual process. In this way, our model could construct and utilize both pseudo text generated from given labels and pseudo labels predicted from available unlabeled text, which are gradually refined during the ST phase. We proposed a contrastive loss to leverage the signal of the RE classifier to improve generation quality. In addition, we propose a self-adaptive way to sample pseudo text from different relation classes. Experiments on two document-level RE tasks show that DuRE significantly boosts recall and F1 score with comparable precision, especially for long-tail relations against several strong baselines.
Anthology ID:
2024.naacl-long.30
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
540–555
Language:
URL:
https://aclanthology.org/2024.naacl-long.30
DOI:
Bibkey:
Cite (ACL):
Yuxi Feng and Laks Lakshmanan. 2024. DuRE: Dual Contrastive Self Training for Semi-Supervised Relation Extraction. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 540–555, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
DuRE: Dual Contrastive Self Training for Semi-Supervised Relation Extraction (Feng & Lakshmanan, NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.30.pdf
Copyright:
 2024.naacl-long.30.copyright.pdf