Table Question Answering for Low-resourced Indic Languages

Vaishali Pal; Evangelos Kanoulas; Andrew Yates; Maarten Rijke

Table Question Answering for Low-resourced Indic Languages

Vaishali Pal, Evangelos Kanoulas, Andrew Yates, Maarten Rijke

Abstract

TableQA is the task of answering questions over tables of structured information, returning individual cells or tables as output. TableQA research has focused primarily on high-resource languages, leaving medium- and low-resource languages with little progress due to scarcity of annotated data and neural models. We address this gap by introducing a fully automatic large-scale tableQA data generation process for low-resource languages with limited budget. We incorporate our data generation method on two Indic languages, Bengali and Hindi, which have no tableQA datasets or models. TableQA models trained on our large-scale datasets outperform state-of-the-art LLMs. We further study the trained models on different aspects, including mathematical reasoning capabilities and zero-shot cross-lingual transfer. Our work is the first on low-resource tableQA focusing on scalable data generation and evaluation procedures. Our proposed data generation method can be applied to any low-resource language with a web presence. We release datasets, models, and code (https://github.com/kolk/Low-Resource-TableQA-Indic-languages).

Anthology ID:: 2024.emnlp-main.5
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 75–92
Language:
URL:: https://aclanthology.org/2024.emnlp-main.5
DOI:
Bibkey:
Cite (ACL):: Vaishali Pal, Evangelos Kanoulas, Andrew Yates, and Maarten Rijke. 2024. Table Question Answering for Low-resourced Indic Languages. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 75–92, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Table Question Answering for Low-resourced Indic Languages (Pal et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.5.pdf
Software:: 2024.emnlp-main.5.software.zip
Data:: 2024.emnlp-main.5.data.zip

PDF Cite Search Software Data