Russian Jeopardy! Data Set for Question-Answering Systems

Elena Mikhalkova, Alexander A. Khlyupin


Abstract
Question answering (QA) is one of the most common NLP tasks that relates to named entity recognition, fact extraction, semantic search and some other fields. In industry, it is much valued in chat-bots and corporate information systems. It is also a challenging task that attracted the attention of a very general audience at the quiz show Jeopardy! In this article we describe a Jeopardy!-like Russian QA data set collected from the official Russian quiz database Ch-g-k. The data set includes 379,284 quiz-like questions with 29,375 from the Russian analogue of Jeopardy! (Own Game). We observe its linguistic features and the related QA-task. We conclude about perspectives of a QA challenge based on the collected data set.
Anthology ID:
2022.lrec-1.53
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
508–514
Language:
URL:
https://aclanthology.org/2022.lrec-1.53
DOI:
Bibkey:
Cite (ACL):
Elena Mikhalkova and Alexander A. Khlyupin. 2022. Russian Jeopardy! Data Set for Question-Answering Systems. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 508–514, Marseille, France. European Language Resources Association.
Cite (Informal):
Russian Jeopardy! Data Set for Question-Answering Systems (Mikhalkova & Khlyupin, LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.53.pdf
Code
 evrog/russian-qa-jeopardy