Mai Omura


2024

pdf bib
Collection of Japanese Route Information Reference Expressions Using Maps as Stimuli
Yoshiko Kawabata | Mai Omura | Hikari Konishi | Masayuki Asahara | Johane Takeuchi
Proceedings of the 4th Workshop on Spatial Language Understanding and Grounded Communication for Robotics (SpLU-RoboNLP 2024)

We constructed a database of Japanese expressions based on route information. Using 20 maps as stimuli, we requested descriptions of routes between two points on each map from 40 individuals per route, collecting 1600 route information reference expressions. We determined whether the expressions were based solely on relative reference expressions by using landmarks on the maps. In cases in which only relative reference expressions were used, we labeled the presence or absence of information regarding the starting point, waypoints, and destination. Additionally, we collected clarity ratings for each expression using a survey.

2023

pdf bib
Spatial Information Annotation Based on the Double Cross Model
Yoshiko Kawabata | Mai Omura | Masayuki Asahara | Johane Takeuchi
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

pdf bib
UD_Japanese-CEJC: Dependency Relation Annotation on Corpus of Everyday Japanese Conversation
Mai Omura | Hiroshi Matsuda | Masayuki Asahara | Aya Wakasa
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

In this study, we have developed Universal Dependencies (UD) resources for spoken Japanese in the Corpus of Everyday Japanese Conversation (CEJC). The CEJC is a large corpus of spoken language that encompasses various everyday conversations in Japanese, and includes word delimitation and part-of-speech annotation. We have newly annotated Long Word Unit delimitation and Bunsetsu (Japanese phrase)-based dependencies, including Bunsetsu boundaries, for CEJC. The UD of Japanese resources was constructed in accordance with hand-maintained conversion rules from the CEJC with two types of word delimitation, part-of-speech tags and Bunsetsu-based syntactic dependency relations. Furthermore, we examined various issues pertaining to the construction of UD in the CEJC by comparing it with the written Japanese corpus and evaluating UD parsing accuracy.

2021

pdf bib
Word Delimitation Issues in UD Japanese
Mai Omura | Aya Wakasa | Masayuki Asahara
Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest 2021)

2018

pdf bib
Universal Dependencies Version 2 for Japanese
Masayuki Asahara | Hiroshi Kanayama | Takaaki Tanaka | Yusuke Miyao | Sumire Uematsu | Shinsuke Mori | Yuji Matsumoto | Mai Omura | Yugo Murawaki
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
UD-Japanese BCCWJ: Universal Dependencies Annotation for the Balanced Corpus of Contemporary Written Japanese
Mai Omura | Masayuki Asahara
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

In this paper, we describe a corpus UD Japanese-BCCWJ that was created by converting the Balanced Corpus of Contemporary Written Japanese (BCCWJ), a Japanese language corpus, to adhere to the UD annotation schema. The BCCWJ already assigns dependency information at the level of the bunsetsu (a Japanese syntactic unit comparable to the phrase). We developed a program to convert the BCCWJ to UD based on this dependency structure, and this corpus is the result of completely automatic conversion using the program. UD Japanese-BCCWJ is the largest-scale UD Japanese corpus and the second-largest of all UD corpora, including 1,980 documents, 57,109 sentences, and 1,273k words across six distinct domains.