Sicheol Sung

2026

Repairing Regex Vulnerabilities via Localization-Guided Instructions
Sicheol Sung | Joonghyuk Hahn | Yo-Sub Han
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Regular expressions (regexes) are foundational to modern computing for criticaltasks like input validation and data parsing, yet their ubiquity exposes systemsto regular expression denial of service (ReDoS), a vulnerability requiringautomated repair methods. Current approaches, however, are hampered by atrade-off. Symbolic, rule-based systems are precise but fail to repair unseen orcomplex vulnerability patterns. Conversely, large language models (LLMs) possessthe necessary generalizability but are unreliable for tasks demanding strictsyntactic and semantic correctness. We resolve this impasse by introducing ahybrid framework, localized regex repair (LRR), designed to harness LLMgeneralization while enforcing reliability. Our core insight is to decoupleproblem identification from the repair process. First, a deterministic, symbolicmodule localizes the precise vulnerable subpattern, creating a constrained andtractable problem space. Then, the LLM is invoked to generate a semanticallyequivalent fix for this isolated segment. This combined architecturesuccessfully resolves complex repair cases intractable for rule-based repairwhile avoiding the semantic errors of LLM-only approaches. Our work provides avalidated methodology for solving such problems in automated repair, improvingthe repair rate by 15.4%p over the state-of-the-art.

2025

pdf bib abs

TrapDoc: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents
Hyundong Jin | Sicheol Sung | Shinwoo Park | SeungYeop Baik | Yo-Sub Han
Findings of the Association for Computational Linguistics: EMNLP 2025

The reasoning, writing, text-editing, and retrieval capabilities of proprietary large language models (LLMs) have advanced rapidly, providing users with an ever-expanding set of functionalities. However, this growing utility has also led to a serious societal concern: the over-reliance on LLMs. In particular, users increasingly delegate tasks such as homework, assignments, or the processing of sensitive documents to LLMs without meaningful engagement. This form of over-reliance and misuse is emerging as a significant social issue. In order to mitigate these issues, we propose a method injecting imperceptible phantom tokens into documents, which causes LLMs to generate outputs that appear plausible to users but are in fact incorrect. Based on this technique, we introduce TrapDoc, a framework designed to deceive over-reliant LLM users. Through empirical evaluation, we demonstrate the effectiveness of our framework on proprietary LLMs, comparing its impact against several baselines. TrapDoc serves as a strong foundation for promoting more responsible and thoughtful engagement with language models.

Co-authors

Venues

EACL1
Findings1

Fix author