Vitória Bispo
2026
ConsumerBR: A Large-Scale Corpus of Consumer Complaints in Brazilian Portuguese
Luis A. Duarte | Pedro Giacomin | Vitória Bispo | Mariana O. Silva | Adriano C. M. Pereira | Gisele L. Pappa
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Luis A. Duarte | Pedro Giacomin | Vitória Bispo | Mariana O. Silva | Adriano C. M. Pereira | Gisele L. Pappa
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
We present ConsumerBR, a large-scale corpus of consumer complaints and company responses in Brazilian Portuguese, compiled from publicly available data on the Consumidor.gov.br platform. The corpus comprises over 3.1 million consumer–company interactions collected between 2021 and 2025 and combines anonymized textual content with rich structured metadata, including temporal information, complaint outcomes, and consumer satisfaction indicators. We describe a data collection strategy tailored to the platform’s dynamic interface, a preprocessing pipeline that includes response clustering to identify template-based replies, and a hybrid anonymization approach designed to mitigate privacy risks. We also provide a detailed statistical characterization of the corpus, highlighting its scale, coverage, and distributional properties. ConsumerBR is publicly available for research purposes and supports a wide range of applications, including complaint analysis, sentiment modeling, dialogue and response generation, and preference-based evaluation.