Misinformation Detection in the Wild: News Source Classification as a Proxy for Non-article Texts

Matyas Bohacek


Abstract
Creating classifiers of disinformation is time-consuming, expensive, and requires vast effort from experts spanning different fields. Even when these efforts succeed, their roll-out to publicly available applications stagnates. While these models struggle to find their consumer-accessible use, disinformation behavior online evolves at a pressing speed. The hoaxes get shared in various abbreviations on social networks, often in user-restricted areas, making external monitoring and intervention virtually impossible. To re-purpose existing NLP methods for the new paradigm of sharing misinformation, we propose leveraging information about given texts’ originating news sources to proxy the respective text’s trustworthiness. We first present a methodology for determining the sources’ overall credibility. We demonstrate our pipeline construction in a specific language and introduce CNSC: a novel dataset for Czech articles’ news source and source credibility classification. We constitute initial benchmarks on multiple architectures. Lastly, we create in-the-wild wrapper applications of the trained models: a chatbot, a browser extension, and a standalone web application.
Anthology ID:
2022.nlp4pi-1.10
Volume:
Proceedings of the Second Workshop on NLP for Positive Impact (NLP4PI)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Laura Biester, Dorottya Demszky, Zhijing Jin, Mrinmaya Sachan, Joel Tetreault, Steven Wilson, Lu Xiao, Jieyu Zhao
Venue:
NLP4PI
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
79–88
Language:
URL:
https://aclanthology.org/2022.nlp4pi-1.10
DOI:
10.18653/v1/2022.nlp4pi-1.10
Bibkey:
Cite (ACL):
Matyas Bohacek. 2022. Misinformation Detection in the Wild: News Source Classification as a Proxy for Non-article Texts. In Proceedings of the Second Workshop on NLP for Positive Impact (NLP4PI), pages 79–88, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Misinformation Detection in the Wild: News Source Classification as a Proxy for Non-article Texts (Bohacek, NLP4PI 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.nlp4pi-1.10.pdf
Video:
 https://aclanthology.org/2022.nlp4pi-1.10.mp4