%0 Conference Proceedings
%T TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data
%A Yin, Pengcheng
%A Neubig, Graham
%A Yih, Wen-tau
%A Riedel, Sebastian
%Y Jurafsky, Dan
%Y Chai, Joyce
%Y Schluter, Natalie
%Y Tetreault, Joel
%S Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
%D 2020
%8 July
%I Association for Computational Linguistics
%C Online
%F yin-etal-2020-tabert
%X Recent years have witnessed the burgeoning of pretrained language models (LMs) for text-based natural language (NL) understanding tasks. Such models are typically trained on free-form NL text, hence may not be suitable for tasks like semantic parsing over structured data, which require reasoning over both free-form NL questions and structured tabular data (e.g., database tables). In this paper we present TaBERT, a pretrained LM that jointly learns representations for NL sentences and (semi-)structured tables. TaBERT is trained on a large corpus of 26 million tables and their English contexts. In experiments, neural semantic parsers using TaBERT as feature representation layers achieve new best results on the challenging weakly-supervised semantic parsing benchmark WikiTableQuestions, while performing competitively on the text-to-SQL dataset Spider.
%R 10.18653/v1/2020.acl-main.745
%U https://aclanthology.org/2020.acl-main.745
%U https://doi.org/10.18653/v1/2020.acl-main.745
%P 8413-8426