UD_Japanese-CEJC: Dependency Relation Annotation on Corpus of Everyday Japanese Conversation

Mai Omura; Hiroshi Matsuda; Masayuki Asahara; Aya Wakasa

doi:10.18653/v1/2023.sigdial-1.29

UD_Japanese-CEJC: Dependency Relation Annotation on Corpus of Everyday Japanese Conversation

Mai Omura, Hiroshi Matsuda, Masayuki Asahara, Aya Wakasa

Abstract

In this study, we have developed Universal Dependencies (UD) resources for spoken Japanese in the Corpus of Everyday Japanese Conversation (CEJC). The CEJC is a large corpus of spoken language that encompasses various everyday conversations in Japanese, and includes word delimitation and part-of-speech annotation. We have newly annotated Long Word Unit delimitation and Bunsetsu (Japanese phrase)-based dependencies, including Bunsetsu boundaries, for CEJC. The UD of Japanese resources was constructed in accordance with hand-maintained conversion rules from the CEJC with two types of word delimitation, part-of-speech tags and Bunsetsu-based syntactic dependency relations. Furthermore, we examined various issues pertaining to the construction of UD in the CEJC by comparing it with the written Japanese corpus and evaluating UD parsing accuracy.

Anthology ID:: 2023.sigdial-1.29
Volume:: Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:: September
Year:: 2023
Address:: Prague, Czechia
Editors:: Svetlana Stoyanchev, Shafiq Joty, David Schlangen, Ondrej Dusek, Casey Kennington, Malihe Alikhani
Venue:: SIGDIAL
SIG:: SIGDIAL
Publisher:: Association for Computational Linguistics
Note:
Pages:: 324–335
Language:
URL:: https://aclanthology.org/2023.sigdial-1.29/
DOI:: 10.18653/v1/2023.sigdial-1.29
Bibkey:
Cite (ACL):: Mai Omura, Hiroshi Matsuda, Masayuki Asahara, and Aya Wakasa. 2023. UD_Japanese-CEJC: Dependency Relation Annotation on Corpus of Everyday Japanese Conversation. In Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 324–335, Prague, Czechia. Association for Computational Linguistics.
Cite (Informal):: UD_Japanese-CEJC: Dependency Relation Annotation on Corpus of Everyday Japanese Conversation (Omura et al., SIGDIAL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.sigdial-1.29.pdf

PDF Cite Search Fix data