Effective Crowdsourcing of Multiple Tasks for Comprehensive Knowledge Extraction

Sangha Nam, Minho Lee, Donghwan Kim, Kijong Han, Kuntae Kim, Sooji Yoon, Eun-kyung Kim, Key-Sun Choi


Abstract
Information extraction from unstructured texts plays a vital role in the field of natural language processing. Although there has been extensive research into each information extraction task (i.e., entity linking, coreference resolution, and relation extraction), data are not available for a continuous and coherent evaluation of all information extraction tasks in a comprehensive framework. Given that each task is performed and evaluated with a different dataset, analyzing the effect of the previous task on the next task with a single dataset throughout the information extraction process is impossible. This paper aims to propose a Korean information extraction initiative point and promote research in this field by presenting crowdsourcing data collected for four information extraction tasks from the same corpus and the training and evaluation results for each task of a state-of-the-art model. These machine learning data for Korean information extraction are the first of their kind, and there are plans to continuously increase the data volume. The test results will serve as an initiative result for each Korean information extraction task and are expected to serve as a comparison target for various studies on Korean information extraction using the data collected in this study.
Anthology ID:
2020.lrec-1.27
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
212–219
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.27
DOI:
Bibkey:
Cite (ACL):
Sangha Nam, Minho Lee, Donghwan Kim, Kijong Han, Kuntae Kim, Sooji Yoon, Eun-kyung Kim, and Key-Sun Choi. 2020. Effective Crowdsourcing of Multiple Tasks for Comprehensive Knowledge Extraction. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 212–219, Marseille, France. European Language Resources Association.
Cite (Informal):
Effective Crowdsourcing of Multiple Tasks for Comprehensive Knowledge Extraction (Nam et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.27.pdf