Sai Wu
2025
AIGT: AI Generative Table Based on Prompt
Mingming Zhang
|
Zhiqing Xiao
|
Guoshan Lu
|
Sai Wu
|
Weiqiang Wang
|
Xing Fu
|
Can Yi
|
Junbo Zhao
Proceedings of the 31st International Conference on Computational Linguistics
Tabular data, which accounts for over 80% of enterprise data assets, is vital in various fields. With growing concerns about privacy protection and data-sharing restrictions, generating high-quality synthetic tabular data has become essential. Recent advancements show that large language models (LLMs) can effectively generate realistic tabular data by leveraging semantic information and overcoming the challenges of high-dimensional data that arise from one-hot encoding. However, current methods do not fully utilize the rich information available in tables. To address this, we introduce AI Generative Table based on prompt enhancement, a novel approach that utilizes metadata information, such as table descriptions and schemas, as prompts to generate ultra-high-quality synthetic data. To overcome the token limit constraints of LLMs, we propose long-token partitioning algorithms that enable AIGT to model tables of any scale. AIGT achieves state-of-the-art performance on 14 out of 20 public datasets and two real industry datasets within the Alipay risk control system.
2022
Towards Unifying the Label Space for Aspect- and Sentence-based Sentiment Analysis
Yiming Zhang
|
Min Zhang
|
Sai Wu
|
Junbo Zhao
Findings of the Association for Computational Linguistics: ACL 2022
The aspect-based sentiment analysis (ABSA) is a fine-grained task that aims to determine the sentiment polarity towards targeted aspect terms occurring in the sentence. The development of the ABSA task is very much hindered by the lack of annotated data. To tackle this, the prior works have studied the possibility of utilizing the sentiment analysis (SA) datasets to assist in training the ABSA model, primarily via pretraining or multi-task learning. In this article, we follow this line, and for the first time, we manage to apply the Pseudo-Label (PL) method to merge the two homogeneous tasks. While it seems straightforward to use generated pseudo labels to handle this case of label granularity unification for two highly related tasks, we identify its major challenge in this paper and propose a novel framework, dubbed as Dual-granularity Pseudo Labeling (DPL). Further, similar to PL, we regard the DPL as a general framework capable of combining other prior methods in the literature. Through extensive experiments, DPL has achieved state-of-the-art performance on standard benchmarks surpassing the prior work significantly.