A Benchmark Suite of Japanese Natural Questions

Takuya Uematsu; Hao Wang (汪浩); Daisuke Kawahara; Tomohide Shibata

doi:10.18653/v1/2024.starsem-1.5

A Benchmark Suite of Japanese Natural Questions

Takuya Uematsu, Hao Wang, Daisuke Kawahara, Tomohide Shibata

Abstract

To develop high-performance and robust natural language processing (NLP) models, it is important to have various question answering (QA) datasets to train, evaluate, and analyze them. Although there are various QA datasets available in English, there are only a few QA datasets in other languages. We focus on Japanese, a language with only a few basic QA datasets, and aim to build a Japanese version of Natural Questions (NQ) consisting of questions that naturally arise from human information needs. We collect natural questions from query logs of a Japanese search engine and build the dataset using crowdsourcing. We construct Japanese Natural Questions (JNQ) and a Japanese version of BoolQ (JBoolQ), which is derived from NQ and consists of yes/no questions. JNQ consists of 16,871 questions, and JBoolQ consists of 6,467 questions. We also define two tasks from JNQ and one from JBoolQ and establish baselines using competitive methods drawn from related literature. We hope that these datasets will facilitate research on QA and NLP models in Japanese. We are planning to release JNQ and JBoolQ.

Anthology ID:: 2024.starsem-1.5
Volume:: Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Danushka Bollegala, Vered Shwartz
Venue:: *SEM
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 58–68
Language:
URL:: https://aclanthology.org/2024.starsem-1.5/
DOI:: 10.18653/v1/2024.starsem-1.5
Bibkey:
Cite (ACL):: Takuya Uematsu, Hao Wang, Daisuke Kawahara, and Tomohide Shibata. 2024. A Benchmark Suite of Japanese Natural Questions. In Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024), pages 58–68, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: A Benchmark Suite of Japanese Natural Questions (Uematsu et al., *SEM 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.starsem-1.5.pdf
Video:: https://aclanthology.org/2024.starsem-1.5.mp4

PDF Cite Search Video Fix data