Sang-Won Park


2020

pdf bib
Constructing a Korean Named Entity Recognition Dataset for the Financial Domain using Active Learning
Dong-Ho Jeong | Min-Kang Heo | Hyung-Chul Kim | Sang-Won Park
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

The performance of deep learning models depends on the quality and quantity of data. Data construction, however, is time- consuming and costly. In addition, when expert domain data are constructed, the availability of experts is limited. In such cases, active learning can efficiently increase the performance of the learning models with minimal data construction. Although various datasets have been constructed using active learning techniques, vigorous studies on the construction of Korean data on expert domains are yet to be conducted. In this study, a corpus for named entity recognition was constructed for the financial domain using the active learning technique. The contributions of the study are as follows. (1) It was verified that the active learning technique could effectively construct the named entity recognition corpus for the financial domain, and (2) a named entity recognizer for the financial domain was developed. Data of 8,043 sentences were constructed using the proposed method, and the performance of the named entity recognizer reached 80.84%. Moreover, the proposed method reduced data construction costs by 12–25%