Hongyu Chen


2024

pdf bib
What Can Go Wrong in Authorship Profiling: Cross-Domain Analysis of Gender and Age Prediction
Hongyu Chen | Michael Roth | Agnieszka Falenska
Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Authorship Profiling (AP) aims to predict the demographic attributes (such as gender and age) of authors based on their writing styles. Ever-improving models mean that this task is gaining interest and application possibilities. However, with greater use also comes the risk that authors are misclassified more frequently, and it remains unclear to what extent the better models can capture the bias and who is affected by the models’ mistakes. In this paper, we investigate three established datasets for AP as well as classical and neural classifiers for this task. Our analyses show that it is often possible to predict the demographic information of the authors based on textual features. However, some features learned by the models are specific to datasets. Moreover, models are prone to errors based on stereotypes associated with topical bias.