Mining Health-related Cause-Effect Statements with High Precision at Large Scale

Ferdinand Schlatt, Dieter Bettin, Matthias Hagen, Benno Stein, Martin Potthast


Abstract
An efficient assessment of the health relatedness of text passages is important to mine the web at scale to conduct health sociological analyses or to develop a health search engine. We propose a new efficient and effective termhood score for predicting the health relatedness of phrases and sentences, which achieves 69% recall at over 90% precision on a web dataset with cause-effect statements. It is more effective than state-of-the-art medical entity linkers and as effective but much faster than BERT-based approaches. Using our method, we compile the Webis Medical CauseNet 2022, a new resource of 7.8 million health-related cause-effect statements such as “Studies show that stress induces insomnia” in which the cause (‘stress’) and effect (‘insomnia’) are labeled.
Anthology ID:
2022.coling-1.167
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
1925–1936
Language:
URL:
https://aclanthology.org/2022.coling-1.167
DOI:
Bibkey:
Cite (ACL):
Ferdinand Schlatt, Dieter Bettin, Matthias Hagen, Benno Stein, and Martin Potthast. 2022. Mining Health-related Cause-Effect Statements with High Precision at Large Scale. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1925–1936, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Mining Health-related Cause-Effect Statements with High Precision at Large Scale (Schlatt et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.167.pdf
Code
 webis-de/coling-22