Topic Modeling for Auditing Purposes in the Banking Sector

Alessandro Giaconia, Valeria Chiariello, Marco Passarotti


Abstract
This study explores the application of topic modeling techniques for auditing purposes in the banking sector, focusing on the analysis of suspicious activity reports. We compare three topic modeling algorithms: Latent Dirichlet Allocation (LDA), Embedded Topic Model (ETM), and Product of Experts LDA (ProdLDA), using a dataset of 35,000 suspicious activity reports from an Italian bank. The models were evaluated using coherence score, NPMI coherence, and topic diversity metrics. Our results show that ProdLDA consistently outperformed LDA and ETM, with the best performance achieved using 1-gram word embeddings. The study reveals distinct topics related to specific client activities, cross-border transactions, and high-risk business sectors like gambling. These results demonstrate the potential of advanced topic modeling techniques in enhancing the efficiency and effectiveness of auditing processes in the banking sector, particularly in the analysis of suspicious activities, that could be tied to money laundering and terrorism.
Anthology ID:
2024.clicit-1.112
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
1030–1035
Language:
URL:
https://aclanthology.org/2024.clicit-1.112/
DOI:
Bibkey:
Cite (ACL):
Alessandro Giaconia, Valeria Chiariello, and Marco Passarotti. 2024. Topic Modeling for Auditing Purposes in the Banking Sector. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 1030–1035, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
Topic Modeling for Auditing Purposes in the Banking Sector (Giaconia et al., CLiC-it 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clicit-1.112.pdf