Design and Evaluation of the Corpus of Everyday Japanese Conversation

Hanae Koiso; Haruka Amatani; Yasuharu Den; Yuriko Iseki; Yuichi Ishimoto; Wakako Kashino; Yoshiko Kawabata; Ken’ya Nishikawa; Yayoi Tanaka; Yasuyuki Usuda; Yuka Watanabe

Design and Evaluation of the Corpus of Everyday Japanese Conversation

Hanae Koiso, Haruka Amatani, Yasuharu Den, Yuriko Iseki, Yuichi Ishimoto, Wakako Kashino, Yoshiko Kawabata, Ken’ya Nishikawa, Yayoi Tanaka, Yasuyuki Usuda, Yuka Watanabe

Abstract

We have constructed the Corpus of Everyday Japanese Conversation (CEJC) and published it in March 2022. The CEJC is designed to contain various kinds of everyday conversations in a balanced manner to capture their diversity. The CEJC features not only audio but also video data to facilitate precise understanding of the mechanism of real-life social behavior. The publication of a large-scale corpus of everyday conversations that includes video data is a new approach. The CEJC contains 200 hours of speech, 577 conversations, about 2.4 million words, and a total of 1675 conversants. In this paper, we present an overview of the corpus, including the recording method and devices, structure of the corpus, formats of video and audio files, transcription, and annotations. We then report some results of the evaluation of the CEJC in terms of conversant and conversation attributes. We show that the CEJC includes a good balance of adult conversants in terms of gender and age, as well as a variety of conversations in terms of conversation forms, places, activities, and numbers of conversants.

Anthology ID:: 2022.lrec-1.599
Volume:: Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 5587–5594
Language:
URL:: https://aclanthology.org/2022.lrec-1.599/
DOI:
Bibkey:
Cite (ACL):: Hanae Koiso, Haruka Amatani, Yasuharu Den, Yuriko Iseki, Yuichi Ishimoto, Wakako Kashino, Yoshiko Kawabata, Ken’ya Nishikawa, Yayoi Tanaka, Yasuyuki Usuda, and Yuka Watanabe. 2022. Design and Evaluation of the Corpus of Everyday Japanese Conversation. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5587–5594, Marseille, France. European Language Resources Association.
Cite (Informal):: Design and Evaluation of the Corpus of Everyday Japanese Conversation (Koiso et al., LREC 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.lrec-1.599.pdf

PDF Cite Search Fix data