Mingming Zhang

Also published as: 明明


AIGT: AI Generative Table Based on Prompt
Mingming Zhang | Zhiqing Xiao | Guoshan Lu | Sai Wu | Weiqiang Wang | Xing Fu | Can Yi | Junbo Zhao
Proceedings of the 31st International Conference on Computational Linguistics

Tabular data, which accounts for over 80% of enterprise data assets, is vital in various fields. With growing concerns about privacy protection and data-sharing restrictions, generating high-quality synthetic tabular data has become essential. Recent advancements show that large language models (LLMs) can effectively generate realistic tabular data by leveraging semantic information and overcoming the challenges of high-dimensional data that arise from one-hot encoding. However, current methods do not fully utilize the rich information available in tables. To address this, we introduce AI Generative Table based on prompt enhancement, a novel approach that utilizes metadata information, such as table descriptions and schemas, as prompts to generate ultra-high-quality synthetic data. To overcome the token limit constraints of LLMs, we propose long-token partitioning algorithms that enable AIGT to model tables of any scale. AIGT achieves state-of-the-art performance on 14 out of 20 public datasets and two real industry datasets within the Alipay risk control system.


篇章级小句复合体结构自动分析(Chinese Clause Complex Structure Automatic Analysis on Passage)
Zhiyong Luo (罗智勇) | Ruifang Han (韩瑞昉) | Mingming Zhang (张明明) | Yujiao Han (韩玉蛟) | Zhilin Zhao (赵志琳)
Proceedings of the 21st Chinese National Conference on Computational Linguistics


基于话头话体共享结构信息的机器阅读理解研究(Rearch on Machine reading comprehension based on shared structure information between Naming and Telling)
Yujiao Han (韩玉蛟) | Zhiyong Luo (罗智勇) | Mingming Zhang (张明明) | Zhilin Zhao (赵志琳) | Qing Zhang (张青)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“机器阅读理解(Machine Reading Comprehension, MRC)任务旨在让机器回答给定上下文的问题来测试机器理解自然语言的能力。目前,基于大规模预训练语言模型的神经机器阅读理解模型已经取得重要进展,但在涉及答案要素、线索要素和问题要素跨标点句、远距离关联时,答案抽取的准确率还有待提升。本文通过篇章内话头话体结构分析,建立标点句间远距离关联关系、补全共享缺失成分,辅助机器阅读理解答案抽取;设计和实现融合话头话体结构信息的机器阅读理解模型,在公开数据集CMRC2018上的实验结果表明,模型的F1值相对于基线模型提升2.4%,EM值提升6%。”

基于神经网络的半监督CRF中文分词(Semi-supervised CRF Chinese Word Segmentation based on Neural Network)
Zhiyong Luo (罗智勇) | Mingming Zhang (张明明) | Yujiao Han (韩玉蛟) | Zhilin Zhao (赵志琳)
Proceedings of the 21st Chinese National Conference on Computational Linguistics
