ConsumerBR: A Large-Scale Corpus of Consumer Complaints in Brazilian Portuguese

Luis A. Duarte, Pedro Giacomin, Vitória Bispo, Mariana O. Silva, Adriano C. M. Pereira, Gisele L. Pappa


Abstract
We present ConsumerBR, a large-scale corpus of consumer complaints and company responses in Brazilian Portuguese, compiled from publicly available data on the Consumidor.gov.br platform. The corpus comprises over 3.1 million consumer–company interactions collected between 2021 and 2025 and combines anonymized textual content with rich structured metadata, including temporal information, complaint outcomes, and consumer satisfaction indicators. We describe a data collection strategy tailored to the platform’s dynamic interface, a preprocessing pipeline that includes response clustering to identify template-based replies, and a hybrid anonymization approach designed to mitigate privacy risks. We also provide a detailed statistical characterization of the corpus, highlighting its scale, coverage, and distributional properties. ConsumerBR is publicly available for research purposes and supports a wide range of applications, including complaint analysis, sentiment modeling, dialogue and response generation, and preference-based evaluation.
Anthology ID:
2026.propor-1.66
Volume:
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Month:
April
Year:
2026
Address:
Salvador, Brazil
Editors:
Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
Venue:
PROPOR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
667–675
Language:
URL:
https://aclanthology.org/2026.propor-1.66/
DOI:
Bibkey:
Cite (ACL):
Luis A. Duarte, Pedro Giacomin, Vitória Bispo, Mariana O. Silva, Adriano C. M. Pereira, and Gisele L. Pappa. 2026. ConsumerBR: A Large-Scale Corpus of Consumer Complaints in Brazilian Portuguese. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1, pages 667–675, Salvador, Brazil. Association for Computational Linguistics.
Cite (Informal):
ConsumerBR: A Large-Scale Corpus of Consumer Complaints in Brazilian Portuguese (Duarte et al., PROPOR 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.propor-1.66.pdf