KorSmishing Explainer: A Korean-centric LLM-based Framework for Smishing Detection and Explanation Generation

Yunseung Lee, Daehee Han


Abstract
To mitigate the annual financial losses caused by SMS phishing (smishing) in South Korea, we propose an explainable smishing detection framework that adapts to a Korean-centric large language model (LLM). Our framework not only classifies smishing attempts but also provides clear explanations, enabling users to identify and understand these threats. This end-to-end solution encompasses data collection, pseudo-label generation, and parameter-efficient task adaptation for models with fewer than five billion parameters. Our approach achieves a 15% improvement in accuracy over GPT-4 and generates high-quality explanatory text, as validated by seven automatic metrics and qualitative evaluation, including human assessments.
Anthology ID:
2024.emnlp-industry.47
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2024
Address:
Miami, Florida, US
Editors:
Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
642–656
Language:
URL:
https://aclanthology.org/2024.emnlp-industry.47
DOI:
Bibkey:
Cite (ACL):
Yunseung Lee and Daehee Han. 2024. KorSmishing Explainer: A Korean-centric LLM-based Framework for Smishing Detection and Explanation Generation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 642–656, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):
KorSmishing Explainer: A Korean-centric LLM-based Framework for Smishing Detection and Explanation Generation (Lee & Han, EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-industry.47.pdf