Valeria Chiariello
2024
Topic Modeling for Auditing Purposes in the Banking Sector
Alessandro Giaconia
|
Valeria Chiariello
|
Marco Passarotti
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
This study explores the application of topic modeling techniques for auditing purposes in the banking sector, focusing on the analysis of suspicious activity reports. We compare three topic modeling algorithms: Latent Dirichlet Allocation (LDA), Embedded Topic Model (ETM), and Product of Experts LDA (ProdLDA), using a dataset of 35,000 suspicious activity reports from an Italian bank. The models were evaluated using coherence score, NPMI coherence, and topic diversity metrics. Our results show that ProdLDA consistently outperformed LDA and ETM, with the best performance achieved using 1-gram word embeddings. The study reveals distinct topics related to specific client activities, cross-border transactions, and high-risk business sectors like gambling. These results demonstrate the potential of advanced topic modeling techniques in enhancing the efficiency and effectiveness of auditing processes in the banking sector, particularly in the analysis of suspicious activities, that could be tied to money laundering and terrorism.