Christos Faloutsos


pdf bib
Automatic Table Union Search with Tabular Representation Learning
Xuming Hu | Shen Wang | Xiao Qin | Chuan Lei | Zhengyuan Shen | Christos Faloutsos | Asterios Katsifodimos | George Karypis | Lijie Wen | Philip S. Yu
Findings of the Association for Computational Linguistics: ACL 2023

Given a data lake of tabular data as well as a query table, how can we retrieve all the tables in the data lake that can be unioned with the query table? Table union search constitutes an essential task in data discovery and preparation as it enables data scientists to navigate massive open data repositories. Existing methods identify uniability based on column representations (word surface forms or token embeddings) and column relation represented by column representation similarity. However, the semantic similarity obtained between column representations is often insufficient to reveal latent relational features to describe the column relation between pair of columns and not robust to the table noise. To address these issues, in this paper, we propose a multi-stage self-supervised table union search framework called AutoTUS, which represents column relation as a vector– column relational representation and learn column relational representation in a multi-stage manner that can better describe column relation for unionability prediction. In particular, the large language model powered contextualized column relation encoder is updated by adaptive clustering and pseudo label classification iteratively so that the better column relational representation can be learned. Moreover, to improve the robustness of the model against table noises, we propose table noise generator to add table noise to the training table data. Experiments on real-world datasets as well as synthetic test set augmented with table noise show that AutoTUS achieves 5.2% performance gain over the SOTA baseline.

pdf bib
Concept2Box: Joint Geometric Embeddings for Learning Two-View Knowledge Graphs
Zijie Huang | Daheng Wang | Binxuan Huang | Chenwei Zhang | Jingbo Shang | Yan Liang | Zhengyang Wang | Xian Li | Christos Faloutsos | Yizhou Sun | Wei Wang
Findings of the Association for Computational Linguistics: ACL 2023

Knowledge graph embeddings (KGE) have been extensively studied to embed large-scale relational data for many real-world applications. Existing methods have long ignored the fact many KGs contain two fundamentally different views: high-level ontology-view concepts and fine-grained instance-view entities. They usually embed all nodes as vectors in one latent space. However, a single geometric representation fails to capture the structural differences between two views and lacks probabilistic semantics towards concepts’ granularity. We propose Concept2Box, a novel approach that jointly embeds the two views of a KG using dual geometric representations. We model concepts with box embeddings, which learn the hierarchy structure and complex relations such as overlap and disjoint among them. Box volumes can be interpreted as concepts’ granularity. Different from concepts, we model entities as vectors. To bridge the gap between concept box embeddings and entity vector embeddings, we propose a novel vector-to-box distance metric and learn both embeddings jointly. Experiments on both the public DBpedia KG and a newly-created industrial KG showed the effectiveness of Concept2Box.


pdf bib
LinkNBed: Multi-Graph Representation Learning with Entity Linkage
Rakshit Trivedi | Bunyamin Sisman | Xin Luna Dong | Christos Faloutsos | Jun Ma | Hongyuan Zha
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Knowledge graphs have emerged as an important model for studying complex multi-relational data. This has given rise to the construction of numerous large scale but incomplete knowledge graphs encoding information extracted from various resources. An effective and scalable approach to jointly learn over multiple graphs and eventually construct a unified graph is a crucial next step for the success of knowledge-based inference for many downstream applications. To this end, we propose LinkNBed, a deep relational learning framework that learns entity and relationship representations across multiple graphs. We identify entity linkage across graphs as a vital component to achieve our goal. We design a novel objective that leverage entity linkage and build an efficient multi-task training procedure. Experiments on link prediction and entity linkage demonstrate substantial improvements over the state-of-the-art relational learning approaches.


pdf bib
Translation Invariant Word Embeddings
Kejun Huang | Matt Gardner | Evangelos Papalexakis | Christos Faloutsos | Nikos Sidiropoulos | Tom Mitchell | Partha P. Talukdar | Xiao Fu
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing