Gender Identification in Brazilian Portuguese Product Reviews: A Comparative Study of Classical Models, BERT, and LLMs

Tiago de Melo, Carlos M. S. Figueiredo


Abstract
This study analyzes gender identification in Brazilian Portuguese using Amazon reviews drawn from ten product categories. Nine models were evaluated: three classical classifiers (Logistic Regression, Random Forest, and SVM), a multilingual BERT, and five LLMs (ChatGPT 4o, ChatGPT 3.5, DeepSeek, Sabia3, and Sabiazinho). Experiments show that BERT achieved the best performance (macro-F1 = 0.634), outperforming ChatGPT 4o and Logistic Regression by less than one percentage point. Reviews authored by women reach an average F1 of 0.654—four points higher than those by men. Performance also varies by domain: books and automotive are easier, whereas baby and pets are more challenging.
Anthology ID:
2026.propor-1.2
Volume:
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Month:
April
Year:
2026
Address:
Salvador, Brazil
Editors:
Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
Venue:
PROPOR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11–19
Language:
URL:
https://aclanthology.org/2026.propor-1.2/
DOI:
Bibkey:
Cite (ACL):
Tiago de Melo and Carlos M. S. Figueiredo. 2026. Gender Identification in Brazilian Portuguese Product Reviews: A Comparative Study of Classical Models, BERT, and LLMs. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1, pages 11–19, Salvador, Brazil. Association for Computational Linguistics.
Cite (Informal):
Gender Identification in Brazilian Portuguese Product Reviews: A Comparative Study of Classical Models, BERT, and LLMs (Melo & Figueiredo, PROPOR 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.propor-1.2.pdf