Xiao Qin
2023
Automatic Table Union Search with Tabular Representation Learning
Xuming Hu
|
Shen Wang
|
Xiao Qin
|
Chuan Lei
|
Zhengyuan Shen
|
Christos Faloutsos
|
Asterios Katsifodimos
|
George Karypis
|
Lijie Wen
|
Philip S. Yu
Findings of the Association for Computational Linguistics: ACL 2023
Given a data lake of tabular data as well as a query table, how can we retrieve all the tables in the data lake that can be unioned with the query table? Table union search constitutes an essential task in data discovery and preparation as it enables data scientists to navigate massive open data repositories. Existing methods identify uniability based on column representations (word surface forms or token embeddings) and column relation represented by column representation similarity. However, the semantic similarity obtained between column representations is often insufficient to reveal latent relational features to describe the column relation between pair of columns and not robust to the table noise. To address these issues, in this paper, we propose a multi-stage self-supervised table union search framework called AutoTUS, which represents column relation as a vector– column relational representation and learn column relational representation in a multi-stage manner that can better describe column relation for unionability prediction. In particular, the large language model powered contextualized column relation encoder is updated by adaptive clustering and pseudo label classification iteratively so that the better column relational representation can be learned. Moreover, to improve the robustness of the model against table noises, we propose table noise generator to add table noise to the training table data. Experiments on real-world datasets as well as synthetic test set augmented with table noise show that AutoTUS achieves 5.2% performance gain over the SOTA baseline.
2020
A Dual-Attention Network for Joint Named Entity Recognition and Sentence Classification of Adverse Drug Events
Susmitha Wunnava
|
Xiao Qin
|
Tabassum Kakar
|
Xiangnan Kong
|
Elke Rundensteiner
Findings of the Association for Computational Linguistics: EMNLP 2020
An adverse drug event (ADE) is an injury resulting from medical intervention related to a drug. Automatic ADE detection from text is either fine-grained (ADE entity recognition) or coarse-grained (ADE assertive sentence classification), with limited efforts leveraging inter-dependencies among the two granularities. We instead propose a multi-grained joint deep network to concurrently learn the ADE entity recognition and ADE sentence classification tasks. Our joint approach takes advantage of their symbiotic relationship, with a transfer of knowledge between the two levels of granularity. Our dual-attention mechanism constructs multiple distinct representations of a sentence that capture both task-specific and semantic information in the sentence, providing stronger emphasis on the key elements essential for sentence classification. Our model improves state-of- art F1-score for both tasks: (i) entity recognition of ADE words (12.5% increase) and (ii) ADE sentence classification (13.6% increase) on MADE 1.0 benchmark of EHR notes.
2010
CRF-based Experiments for Cross-Domain Chinese Word Segmentation at CIPS-SIGHAN-2010
Xiao Qin
|
Liang Zong
|
Yuqian Wu
|
Xiaojun Wan
|
Jianwu Yang
CIPS-SIGHAN Joint Conference on Chinese Language Processing
Search
Co-authors
- Susmitha Wunnava 1
- Tabassum Kakar 1
- Xiangnan Kong 1
- Elke Rundensteiner 1
- Xuming Hu 1
- show all...