Eunkyung Choi
2026
Taxation Perspectives from Large Language Models: A Case Study on Additional Tax Penalties
Eunkyung Choi | Young Jin Suh | Siun Lee | Hongseok Oh | Juheon Kang | Won Hur | Hun Park | Wonseok Hwang
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Eunkyung Choi | Young Jin Suh | Siun Lee | Hongseok Oh | Juheon Kang | Won Hur | Hun Park | Wonseok Hwang
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
How capable are large language models (LLMs) in the domain of taxation? Although numerous studies have explored the legal domain, research dedicated to taxation remains scarce. Moreover, the datasets used in these studies are either simplified, failing to reflect the real-world complexities, or not released as open-source. To address this gap, we introduce PLAT, a new benchmark designed to assess the ability of LLMs to predict the legitimacy of additional tax penalties. PLAT comprises 300 examples: (1) 100 binary-choice questions, (2) 100 multiple-choice questions, and (3) 100 essay-type questions, all derived from 100 Korean court precedents. PLAT is constructed to evaluate not only LLMs’ understanding of tax law but also their performance in legal cases that require complex reasoning beyond straight forward application of statutes. Our systematic experiments with multiple LLMs reveal that (1) their baseline capabilities are limited, especially in cases involving conflicting issues that require a comprehensive understanding (not only of the statutes but also of the taxpayer’s circumstances), and (2) LLMs struggle particularly with the “AC” stages of “IRAC” even for advanced reasoning models like o3, which actively employ inference-time scaling.
2024
Developing a Pragmatic Benchmark for Assessing Korean Legal Language Understanding in Large Language Models
Yeeun Kim | Youngrok Choi | Eunkyung Choi | JinHwan Choi | Hai Jin Park | Wonseok Hwang
Findings of the Association for Computational Linguistics: EMNLP 2024
Yeeun Kim | Youngrok Choi | Eunkyung Choi | JinHwan Choi | Hai Jin Park | Wonseok Hwang
Findings of the Association for Computational Linguistics: EMNLP 2024
Large language models (LLMs) have demonstrated remarkable performance in the legal domain, with GPT-4 even passing the Uniform Bar Exam in the U.S. However their efficacy remains limited for non-standardized tasks and tasks in languages other than English. This underscores the need for careful evaluation of LLMs within each legal system before application.Here, we introduce KBL, a benchmark for assessing the Korean legal language understanding of LLMs, consisting of (1) 7 legal knowledge tasks (510 examples), (2) 4 legal reasoning tasks (288 examples), and (3) the Korean bar exam (4 domains, 53 tasks, 2,510 examples). First two datasets were developed in close collaboration with lawyers to evaluate LLMs in practical scenarios in a certified manner. Furthermore, considering legal practitioners’ frequent use of extensive legal documents for research, we assess LLMs in both a closed book setting, where they rely solely on internal knowledge, and a retrieval-augmented generation (RAG) setting, using a corpus of Korean statutes and precedents. The results indicate substantial room and opportunities for improvement.