Large-scale spectral clustering using diffusion coordinates on landmark-based bipartite graphs

Khiem Pham, Guangliang Chen


Abstract
Spectral clustering has received a lot of attention due to its ability to separate nonconvex, non-intersecting manifolds, but its high computational complexity has significantly limited its applicability. Motivated by the document-term co-clustering framework by Dhillon (2001), we propose a landmark-based scalable spectral clustering approach in which we first use the selected landmark set and the given data to form a bipartite graph and then run a diffusion process on it to obtain a family of diffusion coordinates for clustering. We show that our proposed algorithm can be implemented based on very efficient operations on the affinity matrix between the given data and selected landmarks, thus capable of handling large data. Finally, we demonstrate the excellent performance of our method by comparing with the state-of-the-art scalable algorithms on several benchmark data sets.
Anthology ID:
W18-1705
Volume:
Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana, USA
Venues:
NAACL | TextGraphs | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
28–37
Language:
URL:
https://aclanthology.org/W18-1705
DOI:
10.18653/v1/W18-1705
Bibkey:
Cite (ACL):
Khiem Pham and Guangliang Chen. 2018. Large-scale spectral clustering using diffusion coordinates on landmark-based bipartite graphs. In Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12), pages 28–37, New Orleans, Louisiana, USA. Association for Computational Linguistics.
Cite (Informal):
Large-scale spectral clustering using diffusion coordinates on landmark-based bipartite graphs (Pham & Chen, 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-1705.pdf