Experimental Evaluation of Topic Modeling Methods for Categorizing Irregularities in Health-related news

Alysson Guimarães, Methanias Colaço Junior, Samuel Almeida, Raphael Fontes


Abstract
Context: The increasing availability of textual data has driven the application of Natural Language Processing (NLP) techniques in public administration to improve public services. Objective: This study aims to analyze topic modeling methods in the context of public health audits conducted by the National Department of SUS Auditing (AudSUS). Methods: A controlled in vitro experiment was conducted to assess the performance of the methods in topic modeling tasks using coherence metrics. Results: The LSA method stood out among models with the highest average C_V and C_NPMI coherence. LSA-based models achieved superior performance compared to 215 other models in configurations with lower top-n and top-k values. Overall, the statistical analysis confirms that the observed differences among the models are not due to random variation. Conclusion: The results underscore the potential of topic modeling methods for clustering news articles that exhibit indications of irregularities, thereby guiding information retrieval during the analytical phase of the audit process. This approach enhances the overall effectiveness of audits and facilitates faster preparation of teams for the operational stage.
Anthology ID:
2026.propor-1.5
Volume:
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Month:
April
Year:
2026
Address:
Salvador, Brazil
Editors:
Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
Venue:
PROPOR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
41–56
Language:
URL:
https://aclanthology.org/2026.propor-1.5/
DOI:
Bibkey:
Cite (ACL):
Alysson Guimarães, Methanias Colaço Junior, Samuel Almeida, and Raphael Fontes. 2026. Experimental Evaluation of Topic Modeling Methods for Categorizing Irregularities in Health-related news. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1, pages 41–56, Salvador, Brazil. Association for Computational Linguistics.
Cite (Informal):
Experimental Evaluation of Topic Modeling Methods for Categorizing Irregularities in Health-related news (Guimarães et al., PROPOR 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.propor-1.5.pdf