Makadi: A Large-Scale Human-Labeled Dataset for Hindi Semantic Parsing

Shashwat Vaibhav, Nisheeth Srivastava


Abstract
Parsing natural language queries into formal database calls is a very well-studied problem. Because of the rich diversity of semantic markers across the world’s languages, progress in solving this problem is irreducibly language-dependent. This has created an asymmetry in progress in NLIDB solutions, with most state-of-the-art efforts focused on the resource-rich English language, with limited progress seen for low resource languages. In this short paper, we present Makadi, a large-scale, complex, cross-lingual, cross-domain semantic parsing and text-to-SQL dataset for semantic parsing in the Hindi language. Produced by translating the recently introduced English language Spider NLIDB dataset, it consists of 9693 questions and SQL queries on 166 databases with multiple tables which cover multiple domains. This is the first large-scale dataset in the Hindi language for semantic parsing and related language understanding tasks. Our dataset is publicly available at: Link removed to preserve anonymization during peer review.
Anthology ID:
2022.wildre-1.12
Volume:
Proceedings of the WILDRE-6 Workshop within the 13th Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Girish Nath Jha, Sobha L., Kalika Bali, Atul Kr. Ojha
Venue:
WILDRE
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
68–73
Language:
URL:
https://aclanthology.org/2022.wildre-1.12
DOI:
Bibkey:
Cite (ACL):
Shashwat Vaibhav and Nisheeth Srivastava. 2022. Makadi: A Large-Scale Human-Labeled Dataset for Hindi Semantic Parsing. In Proceedings of the WILDRE-6 Workshop within the 13th Language Resources and Evaluation Conference, pages 68–73, Marseille, France. European Language Resources Association.
Cite (Informal):
Makadi: A Large-Scale Human-Labeled Dataset for Hindi Semantic Parsing (Vaibhav & Srivastava, WILDRE 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.wildre-1.12.pdf
Data
WikiSQL