What Can Go Wrong in Authorship Profiling: Cross-Domain Analysis of Gender and Age Prediction

Hongyu Chen, Michael Roth, Agnieszka Falenska


Abstract
Authorship Profiling (AP) aims to predict the demographic attributes (such as gender and age) of authors based on their writing styles. Ever-improving models mean that this task is gaining interest and application possibilities. However, with greater use also comes the risk that authors are misclassified more frequently, and it remains unclear to what extent the better models can capture the bias and who is affected by the models’ mistakes. In this paper, we investigate three established datasets for AP as well as classical and neural classifiers for this task. Our analyses show that it is often possible to predict the demographic information of the authors based on textual features. However, some features learned by the models are specific to datasets. Moreover, models are prone to errors based on stereotypes associated with topical bias.
Anthology ID:
2024.gebnlp-1.9
Volume:
Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Agnieszka Faleńska, Christine Basta, Marta Costa-jussà, Seraphina Goldfarb-Tarrant, Debora Nozza
Venues:
GeBNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
150–166
Language:
URL:
https://aclanthology.org/2024.gebnlp-1.9
DOI:
Bibkey:
Cite (ACL):
Hongyu Chen, Michael Roth, and Agnieszka Falenska. 2024. What Can Go Wrong in Authorship Profiling: Cross-Domain Analysis of Gender and Age Prediction. In Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 150–166, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
What Can Go Wrong in Authorship Profiling: Cross-Domain Analysis of Gender and Age Prediction (Chen et al., GeBNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.gebnlp-1.9.pdf