Fast Exact Retrieval for Nearest-neighbor Lookup (FERN)

Richard Zhu


Abstract
Exact nearest neighbor search is a computationally intensive process, and even its simpler sibling — vector retrieval — can be computationally complex. This is exacerbated when retrieving vectors which have high-dimension d relative to the number of vectors, N, in the database. Exact nearest neighbor retrieval has been generally acknowledged to be a O(Nd) problem with no sub-linear solutions. Attention has instead shifted towards Approximate Nearest-Neighbor (ANN) retrieval techniques, many of which have sub-linear or even logarithmic time complexities. However, if our intuition from binary search problems (e.g. d=1 vector retrieval) carries, there ought to be a way to retrieve an organized representation of vectors without brute-forcing our way to a solution. For low dimension (e.g. d=2 or d=3 cases), kd-trees provide a O(dlog N) algorithm for retrieval. Unfortunately the algorithm deteriorates rapidly to a O(dN) solution at high dimensions (e.g. k=128), in practice. We propose a novel algorithm for logarithmic Fast Exact Retrieval for Nearest-neighbor lookup (FERN), inspired by kd-trees. The algorithm achieves O(dlog N) look-up with 100% recall on 10 million d=128 uniformly randomly generated vectors.
Anthology ID:
2024.naacl-srw.6
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Yang (Trista) Cao, Isabel Papadimitriou, Anaelia Ovalle, Marcos Zampieri, Francis Ferraro, Swabha Swayamdipta
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
42–47
Language:
URL:
https://aclanthology.org/2024.naacl-srw.6
DOI:
10.18653/v1/2024.naacl-srw.6
Bibkey:
Cite (ACL):
Richard Zhu. 2024. Fast Exact Retrieval for Nearest-neighbor Lookup (FERN). In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 42–47, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Fast Exact Retrieval for Nearest-neighbor Lookup (FERN) (Zhu, NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-srw.6.pdf