Zhiqing Xiao
2025
AIGT: AI Generative Table Based on Prompt
Mingming Zhang
|
Zhiqing Xiao
|
Guoshan Lu
|
Sai Wu
|
Weiqiang Wang
|
Xing Fu
|
Can Yi
|
Junbo Zhao
Proceedings of the 31st International Conference on Computational Linguistics
Tabular data, which accounts for over 80% of enterprise data assets, is vital in various fields. With growing concerns about privacy protection and data-sharing restrictions, generating high-quality synthetic tabular data has become essential. Recent advancements show that large language models (LLMs) can effectively generate realistic tabular data by leveraging semantic information and overcoming the challenges of high-dimensional data that arise from one-hot encoding. However, current methods do not fully utilize the rich information available in tables. To address this, we introduce AI Generative Table based on prompt enhancement, a novel approach that utilizes metadata information, such as table descriptions and schemas, as prompts to generate ultra-high-quality synthetic data. To overcome the token limit constraints of LLMs, we propose long-token partitioning algorithms that enable AIGT to model tables of any scale. AIGT achieves state-of-the-art performance on 14 out of 20 public datasets and two real industry datasets within the Alipay risk control system.