Finding Influential Instances for Distantly Supervised Relation Extraction
Zifeng Wang, Rui Wen, Xi Chen, Shao-Lun Huang, Ningyu Zhang, Yefeng Zheng
Abstract
Distant supervision (DS) is a strong way to expand the datasets for enhancing relation extraction (RE) models but often suffers from high label noise. Current works based on attention, reinforcement learning, or GAN are black-box models so they neither provide meaningful interpretation of sample selection in DS nor stability on different domains. On the contrary, this work proposes a novel model-agnostic instance sampling method for DS by influence function (IF), namely REIF. Our method identifies favorable/unfavorable instances in the bag based on IF, then does dynamic instance sampling. We design a fast influence sampling algorithm that reduces the computational complexity from 𝒪(mn) to 𝒪(1), with analyzing its robustness on the selected sampling function. Experiments show that by simply sampling the favorable instances during training, REIF is able to win over a series of baselines which have complicated architectures. We also demonstrate that REIF can support interpretable instance selection.- Anthology ID:
- 2022.coling-1.233
- Volume:
- Proceedings of the 29th International Conference on Computational Linguistics
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Editors:
- Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 2639–2650
- Language:
- URL:
- https://aclanthology.org/2022.coling-1.233
- DOI:
- Bibkey:
- Cite (ACL):
- Zifeng Wang, Rui Wen, Xi Chen, Shao-Lun Huang, Ningyu Zhang, and Yefeng Zheng. 2022. Finding Influential Instances for Distantly Supervised Relation Extraction. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2639–2650, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Cite (Informal):
- Finding Influential Instances for Distantly Supervised Relation Extraction (Wang et al., COLING 2022)
- Copy Citation:
- PDF:
- https://aclanthology.org/2022.coling-1.233.pdf
Export citation
@inproceedings{wang-etal-2022-finding, title = "Finding Influential Instances for Distantly Supervised Relation Extraction", author = "Wang, Zifeng and Wen, Rui and Chen, Xi and Huang, Shao-Lun and Zhang, Ningyu and Zheng, Yefeng", editor = "Calzolari, Nicoletta and Huang, Chu-Ren and Kim, Hansaem and Pustejovsky, James and Wanner, Leo and Choi, Key-Sun and Ryu, Pum-Mo and Chen, Hsin-Hsi and Donatelli, Lucia and Ji, Heng and Kurohashi, Sadao and Paggio, Patrizia and Xue, Nianwen and Kim, Seokhwan and Hahm, Younggyun and He, Zhong and Lee, Tony Kyungil and Santus, Enrico and Bond, Francis and Na, Seung-Hoon", booktitle = "Proceedings of the 29th International Conference on Computational Linguistics", month = oct, year = "2022", address = "Gyeongju, Republic of Korea", publisher = "International Committee on Computational Linguistics", url = "https://aclanthology.org/2022.coling-1.233", pages = "2639--2650", abstract = "Distant supervision (DS) is a strong way to expand the datasets for enhancing relation extraction (RE) models but often suffers from high label noise. Current works based on attention, reinforcement learning, or GAN are black-box models so they neither provide meaningful interpretation of sample selection in DS nor stability on different domains. On the contrary, this work proposes a novel model-agnostic instance sampling method for DS by influence function (IF), namely REIF. Our method identifies favorable/unfavorable instances in the bag based on IF, then does dynamic instance sampling. We design a fast influence sampling algorithm that reduces the computational complexity from $\mathcal{O}(mn)$ to $\mathcal{O}(1)$, with analyzing its robustness on the selected sampling function. Experiments show that by simply sampling the favorable instances during training, REIF is able to win over a series of baselines which have complicated architectures. We also demonstrate that REIF can support interpretable instance selection.", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="wang-etal-2022-finding"> <titleInfo> <title>Finding Influential Instances for Distantly Supervised Relation Extraction</title> </titleInfo> <name type="personal"> <namePart type="given">Zifeng</namePart> <namePart type="family">Wang</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Rui</namePart> <namePart type="family">Wen</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Xi</namePart> <namePart type="family">Chen</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Shao-Lun</namePart> <namePart type="family">Huang</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ningyu</namePart> <namePart type="family">Zhang</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yefeng</namePart> <namePart type="family">Zheng</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2022-10</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 29th International Conference on Computational Linguistics</title> </titleInfo> <name type="personal"> <namePart type="given">Nicoletta</namePart> <namePart type="family">Calzolari</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Chu-Ren</namePart> <namePart type="family">Huang</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hansaem</namePart> <namePart type="family">Kim</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">James</namePart> <namePart type="family">Pustejovsky</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Leo</namePart> <namePart type="family">Wanner</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Key-Sun</namePart> <namePart type="family">Choi</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Pum-Mo</namePart> <namePart type="family">Ryu</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hsin-Hsi</namePart> <namePart type="family">Chen</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Lucia</namePart> <namePart type="family">Donatelli</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Heng</namePart> <namePart type="family">Ji</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sadao</namePart> <namePart type="family">Kurohashi</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Patrizia</namePart> <namePart type="family">Paggio</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Nianwen</namePart> <namePart type="family">Xue</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Seokhwan</namePart> <namePart type="family">Kim</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Younggyun</namePart> <namePart type="family">Hahm</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Zhong</namePart> <namePart type="family">He</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Tony</namePart> <namePart type="given">Kyungil</namePart> <namePart type="family">Lee</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Enrico</namePart> <namePart type="family">Santus</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Francis</namePart> <namePart type="family">Bond</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Seung-Hoon</namePart> <namePart type="family">Na</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>International Committee on Computational Linguistics</publisher> <place> <placeTerm type="text">Gyeongju, Republic of Korea</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>Distant supervision (DS) is a strong way to expand the datasets for enhancing relation extraction (RE) models but often suffers from high label noise. Current works based on attention, reinforcement learning, or GAN are black-box models so they neither provide meaningful interpretation of sample selection in DS nor stability on different domains. On the contrary, this work proposes a novel model-agnostic instance sampling method for DS by influence function (IF), namely REIF. Our method identifies favorable/unfavorable instances in the bag based on IF, then does dynamic instance sampling. We design a fast influence sampling algorithm that reduces the computational complexity from \mathcalO(mn) to \mathcalO(1), with analyzing its robustness on the selected sampling function. Experiments show that by simply sampling the favorable instances during training, REIF is able to win over a series of baselines which have complicated architectures. We also demonstrate that REIF can support interpretable instance selection.</abstract> <identifier type="citekey">wang-etal-2022-finding</identifier> <location> <url>https://aclanthology.org/2022.coling-1.233</url> </location> <part> <date>2022-10</date> <extent unit="page"> <start>2639</start> <end>2650</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T Finding Influential Instances for Distantly Supervised Relation Extraction %A Wang, Zifeng %A Wen, Rui %A Chen, Xi %A Huang, Shao-Lun %A Zhang, Ningyu %A Zheng, Yefeng %Y Calzolari, Nicoletta %Y Huang, Chu-Ren %Y Kim, Hansaem %Y Pustejovsky, James %Y Wanner, Leo %Y Choi, Key-Sun %Y Ryu, Pum-Mo %Y Chen, Hsin-Hsi %Y Donatelli, Lucia %Y Ji, Heng %Y Kurohashi, Sadao %Y Paggio, Patrizia %Y Xue, Nianwen %Y Kim, Seokhwan %Y Hahm, Younggyun %Y He, Zhong %Y Lee, Tony Kyungil %Y Santus, Enrico %Y Bond, Francis %Y Na, Seung-Hoon %S Proceedings of the 29th International Conference on Computational Linguistics %D 2022 %8 October %I International Committee on Computational Linguistics %C Gyeongju, Republic of Korea %F wang-etal-2022-finding %X Distant supervision (DS) is a strong way to expand the datasets for enhancing relation extraction (RE) models but often suffers from high label noise. Current works based on attention, reinforcement learning, or GAN are black-box models so they neither provide meaningful interpretation of sample selection in DS nor stability on different domains. On the contrary, this work proposes a novel model-agnostic instance sampling method for DS by influence function (IF), namely REIF. Our method identifies favorable/unfavorable instances in the bag based on IF, then does dynamic instance sampling. We design a fast influence sampling algorithm that reduces the computational complexity from \mathcalO(mn) to \mathcalO(1), with analyzing its robustness on the selected sampling function. Experiments show that by simply sampling the favorable instances during training, REIF is able to win over a series of baselines which have complicated architectures. We also demonstrate that REIF can support interpretable instance selection. %U https://aclanthology.org/2022.coling-1.233 %P 2639-2650
Markdown (Informal)
[Finding Influential Instances for Distantly Supervised Relation Extraction](https://aclanthology.org/2022.coling-1.233) (Wang et al., COLING 2022)
- Finding Influential Instances for Distantly Supervised Relation Extraction (Wang et al., COLING 2022)
ACL
- Zifeng Wang, Rui Wen, Xi Chen, Shao-Lun Huang, Ningyu Zhang, and Yefeng Zheng. 2022. Finding Influential Instances for Distantly Supervised Relation Extraction. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2639–2650, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.