MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification

Bo Zheng; Yudong Chen; Zihua Xiong; Shuai Fang; Peidong He; Yang Yang; Sheng Guo

MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification

Bo Zheng, Yudong Chen, Zihua Xiong, Shuai Fang, Peidong He, Yang Yang, Sheng Guo

Abstract

Tabular data forms the backbone of high-stakes decision systems in finance, healthcare, and beyond. Yet industrial tabular datasets are inherently difficult: high-dimensional, riddled with missing entries, and rarely labeled at scale. While foundation models have revolutionized vision and language, tabular learning still leans on handcrafted features and lacks a general self-supervised framework. We present MaskTab, a unified pre-training framework designed specifically for industrial-scale tabular data. MaskTab encodes missing values via dedicated learnable tokens, enabling the model to distinguish structural absence from random dropout. It jointly optimizes a hybrid supervised pre-training scheme—utilizing a twin-path architecture to reconcile masked reconstruction with task-specific supervision—and an MoE-augmented loss that adaptively routes features through specialized subnetworks. On industrial-scale benchmarks, it achieves +5.04% AUC and +8.28% KS over prior art under rigorous scaling. Moreover, its representations distill effectively into lightweight models, yielding +2.55% AUC and +4.85% KS under strict latency and interpretability constraints, while improving robustness to distribution shifts. Our work demonstrates that tabular data admits a foundation-model treatment—when its structural idiosyncrasies are respected.

Anthology ID:: 2026.findings-acl.2053
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 41268–41280
Language:
URL:: https://aclanthology.org/2026.findings-acl.2053/
DOI:
Bibkey:
Cite (ACL):: Bo Zheng, Yudong Chen, Zihua Xiong, Shuai Fang, Peidong He, Yang Yang, and Sheng Guo. 2026. MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification. In Findings of the Association for Computational Linguistics: ACL 2026, pages 41268–41280, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification (Zheng et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.2053.pdf
Checklist:: 2026.findings-acl.2053.checklist.pdf

PDF Cite Search Checklist Fix data