Mining Health-related Cause-Effect Statements with High Precision at Large Scale
Ferdinand Schlatt, Dieter Bettin, Matthias Hagen, Benno Stein, Martin Potthast
Abstract
An efficient assessment of the health relatedness of text passages is important to mine the web at scale to conduct health sociological analyses or to develop a health search engine. We propose a new efficient and effective termhood score for predicting the health relatedness of phrases and sentences, which achieves 69% recall at over 90% precision on a web dataset with cause-effect statements. It is more effective than state-of-the-art medical entity linkers and as effective but much faster than BERT-based approaches. Using our method, we compile the Webis Medical CauseNet 2022, a new resource of 7.8 million health-related cause-effect statements such as “Studies show that stress induces insomnia” in which the cause (‘stress’) and effect (‘insomnia’) are labeled.- Anthology ID:
- 2022.coling-1.167
- Volume:
- Proceedings of the 29th International Conference on Computational Linguistics
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Editors:
- Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 1925–1936
- Language:
- URL:
- https://aclanthology.org/2022.coling-1.167
- DOI:
- Bibkey:
- Cite (ACL):
- Ferdinand Schlatt, Dieter Bettin, Matthias Hagen, Benno Stein, and Martin Potthast. 2022. Mining Health-related Cause-Effect Statements with High Precision at Large Scale. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1925–1936, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Cite (Informal):
- Mining Health-related Cause-Effect Statements with High Precision at Large Scale (Schlatt et al., COLING 2022)
- Copy Citation:
- PDF:
- https://aclanthology.org/2022.coling-1.167.pdf
- Code
- webis-de/coling-22
Export citation
@inproceedings{schlatt-etal-2022-mining, title = "Mining Health-related Cause-Effect Statements with High Precision at Large Scale", author = "Schlatt, Ferdinand and Bettin, Dieter and Hagen, Matthias and Stein, Benno and Potthast, Martin", editor = "Calzolari, Nicoletta and Huang, Chu-Ren and Kim, Hansaem and Pustejovsky, James and Wanner, Leo and Choi, Key-Sun and Ryu, Pum-Mo and Chen, Hsin-Hsi and Donatelli, Lucia and Ji, Heng and Kurohashi, Sadao and Paggio, Patrizia and Xue, Nianwen and Kim, Seokhwan and Hahm, Younggyun and He, Zhong and Lee, Tony Kyungil and Santus, Enrico and Bond, Francis and Na, Seung-Hoon", booktitle = "Proceedings of the 29th International Conference on Computational Linguistics", month = oct, year = "2022", address = "Gyeongju, Republic of Korea", publisher = "International Committee on Computational Linguistics", url = "https://aclanthology.org/2022.coling-1.167", pages = "1925--1936", abstract = "An efficient assessment of the health relatedness of text passages is important to mine the web at scale to conduct health sociological analyses or to develop a health search engine. We propose a new efficient and effective termhood score for predicting the health relatedness of phrases and sentences, which achieves 69{\%} recall at over 90{\%} precision on a web dataset with cause-effect statements. It is more effective than state-of-the-art medical entity linkers and as effective but much faster than BERT-based approaches. Using our method, we compile the Webis Medical CauseNet 2022, a new resource of 7.8 million health-related cause-effect statements such as {``}Studies show that stress induces insomnia{''} in which the cause ({`}stress{'}) and effect ({`}insomnia{'}) are labeled.", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="schlatt-etal-2022-mining"> <titleInfo> <title>Mining Health-related Cause-Effect Statements with High Precision at Large Scale</title> </titleInfo> <name type="personal"> <namePart type="given">Ferdinand</namePart> <namePart type="family">Schlatt</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Dieter</namePart> <namePart type="family">Bettin</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Matthias</namePart> <namePart type="family">Hagen</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Benno</namePart> <namePart type="family">Stein</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Martin</namePart> <namePart type="family">Potthast</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2022-10</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 29th International Conference on Computational Linguistics</title> </titleInfo> <name type="personal"> <namePart type="given">Nicoletta</namePart> <namePart type="family">Calzolari</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Chu-Ren</namePart> <namePart type="family">Huang</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hansaem</namePart> <namePart type="family">Kim</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">James</namePart> <namePart type="family">Pustejovsky</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Leo</namePart> <namePart type="family">Wanner</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Key-Sun</namePart> <namePart type="family">Choi</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Pum-Mo</namePart> <namePart type="family">Ryu</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hsin-Hsi</namePart> <namePart type="family">Chen</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Lucia</namePart> <namePart type="family">Donatelli</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Heng</namePart> <namePart type="family">Ji</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sadao</namePart> <namePart type="family">Kurohashi</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Patrizia</namePart> <namePart type="family">Paggio</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Nianwen</namePart> <namePart type="family">Xue</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Seokhwan</namePart> <namePart type="family">Kim</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Younggyun</namePart> <namePart type="family">Hahm</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Zhong</namePart> <namePart type="family">He</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Tony</namePart> <namePart type="given">Kyungil</namePart> <namePart type="family">Lee</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Enrico</namePart> <namePart type="family">Santus</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Francis</namePart> <namePart type="family">Bond</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Seung-Hoon</namePart> <namePart type="family">Na</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>International Committee on Computational Linguistics</publisher> <place> <placeTerm type="text">Gyeongju, Republic of Korea</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>An efficient assessment of the health relatedness of text passages is important to mine the web at scale to conduct health sociological analyses or to develop a health search engine. We propose a new efficient and effective termhood score for predicting the health relatedness of phrases and sentences, which achieves 69% recall at over 90% precision on a web dataset with cause-effect statements. It is more effective than state-of-the-art medical entity linkers and as effective but much faster than BERT-based approaches. Using our method, we compile the Webis Medical CauseNet 2022, a new resource of 7.8 million health-related cause-effect statements such as “Studies show that stress induces insomnia” in which the cause (‘stress’) and effect (‘insomnia’) are labeled.</abstract> <identifier type="citekey">schlatt-etal-2022-mining</identifier> <location> <url>https://aclanthology.org/2022.coling-1.167</url> </location> <part> <date>2022-10</date> <extent unit="page"> <start>1925</start> <end>1936</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T Mining Health-related Cause-Effect Statements with High Precision at Large Scale %A Schlatt, Ferdinand %A Bettin, Dieter %A Hagen, Matthias %A Stein, Benno %A Potthast, Martin %Y Calzolari, Nicoletta %Y Huang, Chu-Ren %Y Kim, Hansaem %Y Pustejovsky, James %Y Wanner, Leo %Y Choi, Key-Sun %Y Ryu, Pum-Mo %Y Chen, Hsin-Hsi %Y Donatelli, Lucia %Y Ji, Heng %Y Kurohashi, Sadao %Y Paggio, Patrizia %Y Xue, Nianwen %Y Kim, Seokhwan %Y Hahm, Younggyun %Y He, Zhong %Y Lee, Tony Kyungil %Y Santus, Enrico %Y Bond, Francis %Y Na, Seung-Hoon %S Proceedings of the 29th International Conference on Computational Linguistics %D 2022 %8 October %I International Committee on Computational Linguistics %C Gyeongju, Republic of Korea %F schlatt-etal-2022-mining %X An efficient assessment of the health relatedness of text passages is important to mine the web at scale to conduct health sociological analyses or to develop a health search engine. We propose a new efficient and effective termhood score for predicting the health relatedness of phrases and sentences, which achieves 69% recall at over 90% precision on a web dataset with cause-effect statements. It is more effective than state-of-the-art medical entity linkers and as effective but much faster than BERT-based approaches. Using our method, we compile the Webis Medical CauseNet 2022, a new resource of 7.8 million health-related cause-effect statements such as “Studies show that stress induces insomnia” in which the cause (‘stress’) and effect (‘insomnia’) are labeled. %U https://aclanthology.org/2022.coling-1.167 %P 1925-1936
Markdown (Informal)
[Mining Health-related Cause-Effect Statements with High Precision at Large Scale](https://aclanthology.org/2022.coling-1.167) (Schlatt et al., COLING 2022)
- Mining Health-related Cause-Effect Statements with High Precision at Large Scale (Schlatt et al., COLING 2022)
ACL
- Ferdinand Schlatt, Dieter Bettin, Matthias Hagen, Benno Stein, and Martin Potthast. 2022. Mining Health-related Cause-Effect Statements with High Precision at Large Scale. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1925–1936, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.