%0 Conference Proceedings %T TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data %A Yin, Pengcheng %A Neubig, Graham %A Yih, Wen-tau %A Riedel, Sebastian %Y Jurafsky, Dan %Y Chai, Joyce %Y Schluter, Natalie %Y Tetreault, Joel %S Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics %D 2020 %8 July %I Association for Computational Linguistics %C Online %F yin-etal-2020-tabert %X Recent years have witnessed the burgeoning of pretrained language models (LMs) for text-based natural language (NL) understanding tasks. Such models are typically trained on free-form NL text, hence may not be suitable for tasks like semantic parsing over structured data, which require reasoning over both free-form NL questions and structured tabular data (e.g., database tables). In this paper we present TaBERT, a pretrained LM that jointly learns representations for NL sentences and (semi-)structured tables. TaBERT is trained on a large corpus of 26 million tables and their English contexts. In experiments, neural semantic parsers using TaBERT as feature representation layers achieve new best results on the challenging weakly-supervised semantic parsing benchmark WikiTableQuestions, while performing competitively on the text-to-SQL dataset Spider. %R 10.18653/v1/2020.acl-main.745 %U https://aclanthology.org/2020.acl-main.745 %U https://doi.org/10.18653/v1/2020.acl-main.745 %P 8413-8426