WikiBias as an Extrapolation Corpus for Bias Detection

K. Salas-Jimenez, Francisco Fernando Lopez-Ponce, Sergio-Luis Ojeda-Trueba, Gemma Bel-Enguix


Abstract
This paper explores whether it is possible to train a machine learning model using Wikipedia data to detect subjectivity in sentences and generalize effectively to other domains. To achieve this, we performed experiments with the WikiBias corpus, the BABE corpus, and the CheckThat! Dataset. Various classical models for ML were tested, including Logistic Regression, SVC, and SVR, including characteristics such as Sentence Transformers similarity, probabilistic sentiment measures, and biased lexicons. Pre-trained models like DistilRoBERTa, as well as large language models like Gemma and GPT-4, were also tested for the same classification task.
Anthology ID:
2024.wikinlp-1.10
Volume:
Proceedings of the First Workshop on Advancing Natural Language Processing for Wikipedia
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Lucie Lucie-Aimée, Angela Fan, Tajuddeen Gwadabe, Isaac Johnson, Fabio Petroni, Daniel van Strien
Venue:
WikiNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
46–52
Language:
URL:
https://aclanthology.org/2024.wikinlp-1.10
DOI:
Bibkey:
Cite (ACL):
K. Salas-Jimenez, Francisco Fernando Lopez-Ponce, Sergio-Luis Ojeda-Trueba, and Gemma Bel-Enguix. 2024. WikiBias as an Extrapolation Corpus for Bias Detection. In Proceedings of the First Workshop on Advancing Natural Language Processing for Wikipedia, pages 46–52, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
WikiBias as an Extrapolation Corpus for Bias Detection (Salas-Jimenez et al., WikiNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wikinlp-1.10.pdf