Aijun Dai
2022
Few-Shot Table Understanding: A Benchmark Dataset and Pre-Training Baseline
Ruixue Liu
|
Shaozu Yuan
|
Aijun Dai
|
Lei Shen
|
Tiangang Zhu
|
Meng Chen
|
Xiaodong He
Proceedings of the 29th International Conference on Computational Linguistics
Few-shot table understanding is a critical and challenging problem in real-world scenario as annotations over large amount of tables are usually costly. Pre-trained language models (PLMs), which have recently flourished on tabular data, have demonstrated their effectiveness for table understanding tasks. However, few-shot table understanding is rarely explored due to the deficiency of public table pre-training corpus and well-defined downstream benchmark tasks, especially in Chinese. In this paper, we establish a benchmark dataset, FewTUD, which consists of 5 different tasks with human annotations to systematically explore the few-shot table understanding in depth. Since there is no large number of public Chinese tables, we also collect a large-scale, multi-domain tabular corpus to facilitate future Chinese table pre-training, which includes one million tables and related natural language text with auxiliary supervised interaction signals. Finally, we present FewTPT, a novel table PLM with rich interactions over tabular data, and evaluate its performance comprehensively on the benchmark. Our dataset and model will be released to the public soon.
Search
Fix data
Co-authors
- Meng Chen 1
- Xiaodong He 1
- Ruixue Liu 1
- Lei Shen 1
- Shaozu Yuan 1
- show all...