Annotation Issues in Universal Dependencies for Korean and Japanese

Ji Yoon Han, Tae Hwan Oh, Lee Jin, Hansaem Kim


Abstract
To investigate issues that arise in the process of developing a Universal Dependency (UD) treebank for Korean and Japanese, we begin by addressing the typological characteristics of Korean and Japanese. Both Korean and Japanese are agglutinative and head-final languages. And the principle of word segmentation for both languages is different from English, which makes it difficult to apply UD guidelines. Following the typological characteristics of the two languages and the issue of UD application, we review the application of UPOS and DEPREL schemes to the two languages. The annotation principles for AUX, ADJ, DET, ADP and PART are discussed for the UPOS scheme, and the annotation principles for case, aux, iobj, and obl are discussed for the DEPREL scheme.
Anthology ID:
2020.udw-1.12
Volume:
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Marie-Catherine de Marneffe, Miryam de Lhoneux, Joakim Nivre, Sebastian Schuster
Venue:
UDW
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
99–108
Language:
URL:
https://aclanthology.org/2020.udw-1.12
DOI:
Bibkey:
Cite (ACL):
Ji Yoon Han, Tae Hwan Oh, Lee Jin, and Hansaem Kim. 2020. Annotation Issues in Universal Dependencies for Korean and Japanese. In Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020), pages 99–108, Barcelona, Spain (Online). Association for Computational Linguistics.
Cite (Informal):
Annotation Issues in Universal Dependencies for Korean and Japanese (Han et al., UDW 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.udw-1.12.pdf