Vector Databases

Optimizing Search with HNSW and IVF Indexing

Deep dive into how graph-based and cluster-based indexing strategies balance query speed, memory usage, and retrieval accuracy.

DatabasesIntermediate12 min read

In this article

The Geometry of Search: Moving Beyond Tabular Queries

The Curse of Dimensionality

Graph-Based Indexing with HNSW

Tuning HNSW Parameters

Cluster-Based Indexing with IVF

The Role of nprobe in Query Accuracy

Architectural Trade-offs and Real-World Scenarios

Memory Footprint and Scaling

The Geometry of Search: Moving Beyond Tabular Queries

Traditional relational databases are designed to handle structured data through exact matches and range queries. When you search for a user by their unique identifier or filter orders by a specific date range, the engine uses B-Trees or Hash indexes to navigate a clear, ordered landscape. This deterministic approach fails when we shift to unstructured data like natural language, audio, or visual features represented as high-dimensional vectors.

A vector database does not look for equality but for similarity within a multi-dimensional coordinate system. In this space, two pieces of data are related if the distance between their representative points is small. The fundamental challenge arises from the volume of these dimensions, which often range from 768 to over 1500 in modern machine learning models.

Performing a brute-force comparison across millions of vectors requires calculating the distance for every single entry in the database. This process has a computational complexity of O(N), which quickly leads to latencies that are unacceptable for real-time applications. To solve this, we must use Approximate Nearest Neighbor algorithms that trade a small amount of accuracy for massive gains in speed.

The core engineering challenge in vector databases is not just calculating distance, but intelligently avoiding the calculation of distances for 99 percent of the dataset.

The Curse of Dimensionality

As the number of dimensions increases, the volume of the space grows exponentially, causing the data points to become extremely sparse. In high-dimensional spaces, the distance between any two points often becomes nearly uniform, making traditional spatial indexing techniques like R-trees ineffective. This phenomenon is why specialized indexing strategies are required to maintain search relevance at scale.

Engineers must choose between strategies that emphasize graph traversal or those that partition the space into manageable clusters. Each approach handles the curse of dimensionality differently, affecting how much memory the system consumes and how quickly it can return results to the end user.

Graph-Based Indexing with HNSW

Hierarchical Navigable Small World graphs are currently the most popular choice for high-performance vector retrieval. The algorithm organizes vectors into a multi-layered graph where the top layers are sparse and the bottom layers are dense. This architecture allows the search process to skip across large regions of the vector space before zooming in on a local neighborhood.

Mental models for HNSW are often compared to the skip list data structure but applied to a spatial graph context. By navigating long-range edges in the upper layers, the search engine identifies the general vicinity of the target vector with very few operations. Once the general area is found, the search descends to more granular layers to find the most similar neighbors.

pythonConfiguring an HNSW Index

1import faiss
2import numpy as np
3
4# Dimension of the vector space
5dimension = 768
6# Number of neighbors for each node in the graph
7M = 32
8# Size of the dynamic list for the nearest neighbors during construction
9ef_construction = 200
10
11# Initialize the HNSW index using L2 distance
12index = faiss.IndexHNSWFlat(dimension, M)
13index.hnsw.efConstruction = ef_construction
14
15# Standard vector data generation for demonstration
16data = np.random.random((10000, dimension)).astype('float32')
17index.add(data)
18
19# Higher efSearch values increase accuracy but slow down query speed
20index.hnsw.efSearch = 64
21print(f"Total vectors indexed: {index.ntotal}")

The primary advantage of HNSW is its exceptional query speed and high recall rates, often exceeding 95 percent accuracy. However, this performance comes at a high cost in terms of memory because the entire graph structure must be stored in RAM to ensure low-latency traversal. For massive datasets, the memory overhead can become the primary bottleneck in scaling the infrastructure.

Tuning HNSW Parameters

The parameter M defines the number of bi-directional links created for every new element during construction. A larger M value results in better recall for high-dimensional data but increases the size of the index on disk and in memory. This is a critical trade-off when optimizing for throughput versus infrastructure costs.

The efConstruction parameter determines how many neighbors are explored during the initial index build phase. Increasing this value leads to a more accurate graph but significantly extends the time required to build the index. Developers should balance these settings based on whether their application prioritizes fast indexing or high-quality search results.

Cluster-Based Indexing with IVF

The Inverted File Index strategy takes a different approach by partitioning the vector space into distinct regions through clustering. Using an algorithm like K-means, the system identifies centroids that act as the representative centers of various regions. Every vector in the database is then assigned to its nearest centroid, creating a structure similar to a filing cabinet.

When a query arrives, the system first finds the centroids closest to the query vector. Instead of searching the entire database, it only examines the vectors assigned to those specific clusters. This dramatically reduces the number of distance calculations required and allows the index to scale to much larger datasets than a standard graph.

nlist: The number of clusters or centroids to create during the training phase.
nprobe: The number of clusters to search through during the retrieval phase.
Centroid: The mathematical center point of a cluster used to represent a region of space.
Quantization: A technique used within IVF to compress vectors and save memory.

IVF is highly effective when memory is limited because it can be combined with Product Quantization to compress vectors into small codes. This allows billions of vectors to fit into a relatively small amount of RAM. However, the search speed is typically slower than HNSW, and if the query falls near the edge of a cluster, the system might miss the actual nearest neighbor.

The Role of nprobe in Query Accuracy

The nprobe parameter is the most important setting for controlling the balance between speed and accuracy in an IVF index. If nprobe is set to one, the system only searches the single closest cluster, which is very fast but risks missing relevant data. Increasing nprobe expands the search to adjacent clusters, improving recall at the expense of query latency.

In production environments, it is common to perform a grid search to find the optimal nprobe value for a given latency budget. This allows engineers to guarantee that search results meet the necessary quality threshold without overloading the database clusters.

Architectural Trade-offs and Real-World Scenarios

Selecting the right indexing strategy requires a deep understanding of your data access patterns and hardware constraints. If your application requires sub-millisecond responses for a relatively small dataset, HNSW is almost always the superior choice. However, if you are building a system to search through billions of web pages or logs, the memory-efficient nature of IVF becomes essential.

Updating an index also presents different challenges for each strategy. HNSW supports incremental updates quite well, allowing you to add new vectors to the graph without rebuilding the entire structure. IVF indexes typically require a training phase where centroids are calculated, meaning that significant changes in data distribution may necessitate a full re-index of the collection.

pythonImplementing IVF with Product Quantization

1# Define the number of centroids (clusters)
2nlist = 100
3# Number of sub-quantizers for compression
4m = 8
5# Bits per sub-quantizer
6bits = 8
7
8# Create a quantized IVF index
9quantizer = faiss.IndexFlatL2(dimension)
10index_ivf = faiss.IndexIVFPQ(quantizer, dimension, nlist, m, bits)
11
12# IVF requires a training step to find centroids
13train_data = np.random.random((5000, dimension)).astype('float32')
14index_ivf.train(train_data)
15
16# Add vectors to the index after training
17index_ivf.add(data)
18
19# Set nprobe to search more clusters during query
20index_ivf.nprobe = 10
21distances, indices = index_ivf.search(data[:5], k=5)

Another factor to consider is the nature of the embedding model used to generate the vectors. If the model produces vectors that are very densely packed in a specific region of the vector space, an IVF index may suffer from imbalanced clusters. In such cases, the graph-based nature of HNSW can be more resilient to skewed data distributions.

Memory Footprint and Scaling

HNSW usually requires keeping the full vector and the graph edges in memory, which leads to high RAM requirements. To mitigate this, some modern vector databases implement a hybrid approach where the graph is stored on NVMe drives while caching the most frequently accessed nodes in memory. This tiered storage strategy attempts to capture the speed of graphs while maintaining the cost-effectiveness of disk storage.

When scaling horizontally, IVF indexes are often easier to shard across multiple machines. Since each cluster is independent, you can distribute centroids across a cluster of nodes. This enables the system to handle massive throughput by parallelizing the search across different subsets of the data simultaneously.

Understanding Vector Embeddings and Distance Metrics Comparing Architectures of Pinecone, Milvus, and pgvector