Raphael Fontes
2026
Experimental Evaluation of Topic Modeling Methods for Categorizing Irregularities in Health-related news
Alysson Guimarães | Methanias Colaço Junior | Samuel Almeida | Raphael Fontes
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Alysson Guimarães | Methanias Colaço Junior | Samuel Almeida | Raphael Fontes
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Context: The increasing availability of textual data has driven the application of Natural Language Processing (NLP) techniques in public administration to improve public services. Objective: This study aims to analyze topic modeling methods in the context of public health audits conducted by the National Department of SUS Auditing (AudSUS). Methods: A controlled in vitro experiment was conducted to assess the performance of the methods in topic modeling tasks using coherence metrics. Results: The LSA method stood out among models with the highest average C_V and C_NPMI coherence. LSA-based models achieved superior performance compared to 215 other models in configurations with lower top-n and top-k values. Overall, the statistical analysis confirms that the observed differences among the models are not due to random variation. Conclusion: The results underscore the potential of topic modeling methods for clustering news articles that exhibit indications of irregularities, thereby guiding information retrieval during the analytical phase of the audit process. This approach enhances the overall effectiveness of audits and facilitates faster preparation of teams for the operational stage.