CETA: A Consensus Enhanced Training Approach for Denoising in Distantly Supervised Relation Extraction

Ruri Liu, Shasha Mo, Jianwei Niu, Shengda Fan


Abstract
Distantly supervised relation extraction aims to extract relational facts from texts but suffers from noisy instances. Existing methods usually select reliable sentences that rely on potential noisy labels, resulting in wrongly selecting many noisy training instances or underutilizing a large amount of valuable training data. This paper proposes a sentence-level DSRE method beyond typical instance selection approaches by preventing samples from falling into the wrong classification space on the feature space. Specifically, a theorem for denoising and the corresponding implementation, named Consensus Enhanced Training Approach (CETA), are proposed in this paper. By training the model with CETA, samples of different classes are separated, and samples of the same class are closely clustered in the feature space. Thus the model can easily establish the robust classification boundary to prevent noisy labels from biasing wrongly labeled samples into the wrong classification space. This process is achieved by enhancing the classification consensus between two discrepant classifiers and does not depend on any potential noisy labels, thus avoiding the above two limitations. Extensive experiments on widely-used benchmarks have demonstrated that CETA significantly outperforms the previous methods and achieves new state-of-the-art results.
Anthology ID:
2022.coling-1.197
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
2247–2258
Language:
URL:
https://aclanthology.org/2022.coling-1.197
DOI:
Bibkey:
Cite (ACL):
Ruri Liu, Shasha Mo, Jianwei Niu, and Shengda Fan. 2022. CETA: A Consensus Enhanced Training Approach for Denoising in Distantly Supervised Relation Extraction. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2247–2258, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
CETA: A Consensus Enhanced Training Approach for Denoising in Distantly Supervised Relation Extraction (Liu et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.197.pdf
Code
 ethan-rr/ceta