Database Indexing
Scaling Range Queries with B-Tree Indexing
Learn how the hierarchical structure of B-Trees enables fast search, sorting, and range-based retrieval in modern relational databases.
In this article
The Fundamental Problem: Linear Time Complexity
In a database table without any optimization, records are typically stored in the order they were inserted, creating a structure known as a heap. When you execute a query to find a specific user by their email address, the database engine has no choice but to start at the first row and scan every subsequent record until it finds a match. This process is known as a full table scan and represents a linear time complexity of O(N).
For a small application with a few thousand rows, a full table scan is negligible and might even be faster than using an index due to reduced overhead. However, as the application scales to millions of records, the time required to scan the entire dataset grows proportionally. If each disk read takes a fraction of a millisecond, scanning a million rows can quickly lead to multi-second delays that degrade the user experience.
To solve this, we need a way to find data without looking at every row. This is exactly what an index provides: a supplementary data structure that keeps pointers to the data in a sorted or highly organized manner. By trading off a small amount of storage space and write performance, we gain the ability to locate records in logarithmic time.
The primary goal of database indexing is not just to make queries faster, but to ensure that query performance remains predictable and stable as the volume of data grows by several orders of magnitude.
Visualizing the Search Space
Think of a database index like the index at the back of a massive technical textbook. Instead of flipping through every page to find a mention of concurrency, you look up the word alphabetically in the index to find specific page numbers. The database does the same thing, using the index to jump directly to the physical location of the data on the disk storage.
Without this structure, the database engine is essentially a blind traveler walking every street in a city to find a specific house number. With an index, the engine has a map that tells it exactly which neighborhood, street, and block to visit. This reduction in the search space is what allows modern web platforms to serve millions of requests simultaneously.
The Architecture of a B-Tree
Most relational databases like PostgreSQL, MySQL, and SQL Server use a variation of the B-Tree, specifically the B+Tree, as their default indexing structure. A B-Tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. Unlike binary search trees, B-Trees are specifically optimized for systems that read and write large blocks of data.
The most distinctive feature of a B-Tree is its high branching factor. While a binary tree node has at most two children, a B-Tree node can have hundreds of children, which allows the tree to remain very shallow even when storing billions of keys. A typical B-Tree index for a massive table might only be three or four levels deep, meaning the database only needs to perform four disk reads to find any specific record.
- Root Node: The entry point for every search, containing the highest-level range boundaries.
- Internal Nodes: Intermediate levels that act as signposts to direct the search to the correct child page.
- Leaf Nodes: The bottom layer where the actual indexed values and pointers to the table rows are stored.
- Fill Factor: A configuration setting that determines how much empty space is left in each node to accommodate future inserts.
Each node in the tree corresponds to a page of data on the disk, usually 8KB or 16KB in size. When the database searches for a value, it loads the root page into memory, compares the search key against the pointers, and then loads the next relevant child page. This minimizes the number of expensive I/O operations required to navigate the hierarchy.
Why B-Trees Beat Binary Trees
Binary search trees are excellent for in-memory operations but perform poorly on disk-based storage because they tend to become very deep. Each level in a tree usually requires a separate disk seek, and disk seeks are thousands of times slower than memory access. By using a wide branching factor, B-Trees ensure that we travel horizontally across large pages rather than vertically through many small nodes.
Furthermore, B-Trees are designed to stay balanced automatically. Regardless of the order in which data is inserted, the distance from the root to any leaf node remains the same. This balance guarantees that the worst-case performance for a lookup is always predictable and consistent, which is a requirement for high-availability database systems.
Traversing the Tree: Search and Range Logic
When you search for a specific value, the database performs a point lookup by comparing your criteria against the keys in the root node. It identifies the range where your value would reside and follows the associated pointer to the next level down. This process repeats until the engine reaches a leaf node, which contains the definitive answer regarding the existence of the record.
In a B+Tree, the leaf nodes are especially powerful because they are linked together in a doubly-linked list. This means that once the database finds the starting point of a range, it does not need to go back up to the root to find the next record. It can simply move horizontally across the leaf nodes to collect all records that satisfy the range condition.
1-- Create a table for a high-traffic e-commerce inventory
2CREATE TABLE product_inventory (
3 product_id SERIAL PRIMARY KEY,
4 sku_code VARCHAR(50) NOT NULL,
5 price_cents INTEGER NOT NULL,
6 stock_count INTEGER NOT NULL
7);
8
9-- Create a B-Tree index specifically for price filtering
10-- This enables fast range scans for price-based searches
11CREATE INDEX idx_products_price ON product_inventory(price_cents);
12
13-- The following query will perform a range scan on the index
14-- It finds the first leaf node >= 5000 and follows links until > 10000
15SELECT sku_code, stock_count
16FROM product_inventory
17WHERE price_cents BETWEEN 5000 AND 10000;The ability to perform these range scans is why B-Trees are preferred over Hash indexes for most general-purpose applications. While a Hash index can provide O(1) lookups for exact matches, it cannot handle inequality filters or sorting because the hashing process destroys the natural order of the data. B-Trees maintain this order, making them versatile for a wide variety of query patterns.
The Power of Sorting
Because the B-Tree stores data in a sorted fashion, it can often satisfy an ORDER BY clause without requiring an expensive sort operation in memory. If a query requests data sorted by the indexed column, the database simply traverses the leaf nodes in order. This can significantly reduce the CPU and memory pressure on the database server during heavy reporting tasks.
This optimization also applies to grouping operations. When you use GROUP BY on an indexed column, the database can aggregate values as it encounters them in the sorted stream of the index. This avoids the need to build a temporary hash table or perform a multi-pass sort, leading to much faster response times for analytical queries.
The Cost of Performance: Write Amplification
Every index you add to a table is a double-edged sword that increases the cost of every write operation. When you insert a new row into a table, the database must not only write the raw data to the heap but also navigate every single index on that table to insert a new pointer. This phenomenon is known as write amplification, and it can become a bottleneck for write-heavy applications.
Update operations can be even more expensive if they modify a column that is part of an index. In this case, the database must delete the old entry from the index and insert a new one in the correct sorted position. If the updated value moves the record to a different part of the tree, it may trigger structural changes across multiple pages.
A particularly complex event is the node split, which occurs when an insertion targets a leaf node that is already full. The database must allocate a new page, move half of the existing keys to the new page, and then update the parent node to reflect the change. If the parent is also full, the split can propagate all the way up to the root, temporarily increasing latency for that specific write.
1-- Use EXPLAIN ANALYZE to see the impact of indexes on insert speed
2-- A table with 10 indexes will be significantly slower than one with 1
3EXPLAIN ANALYZE
4INSERT INTO product_inventory (sku_code, price_cents, stock_count)
5VALUES ('SKU-999-PROD', 2599, 150);
6
7-- This query helps identify indexes that are never used but still incur write costs
8SELECT
9 relname AS table_name,
10 indexrelname AS index_name,
11 idx_scan AS times_used
12FROM pg_stat_user_indexes
13WHERE idx_scan = 0;To mitigate these costs, developers should avoid the temptation to index every column in a table. Instead, use query monitoring tools to identify the most frequent and slowest queries, then design targeted indexes to support them. Removing unused indexes is just as important as creating new ones for maintaining a healthy, high-performance database.
Page Fragmentation and Bloat
As data is deleted or updated, gaps can appear within the B-Tree nodes. While the database tries to reuse this space for new records, certain patterns of random deletions can lead to fragmentation. Over time, an index may occupy far more disk space than the actual data it contains, leading to increased I/O during scans.
Regular maintenance tasks, such as reindexing or running a vacuum operation in PostgreSQL, can help reclaim this wasted space. Understanding the fill factor of your indexes is also useful; a lower fill factor provides more room for inserts but makes the tree slightly deeper and larger on disk. Balancing these factors is a key part of database tuning.
Designing Effective Composite Indexes
A composite index is an index that covers multiple columns at once, such as an index on (last_name, first_name). These are incredibly powerful for queries that filter on multiple attributes, but they follow a strict rule known as the left-prefix rule. The database can only use a composite index if the query uses the leading columns of the index definition.
If you have an index on (country, city) and you search for users in a specific country, the index is highly effective. If you search for both a country and a city, it is even better. However, if you search only for a city without specifying a country, the database cannot jump to the correct part of the index and will likely ignore it in favor of a full table scan.
The order of columns in a composite index should be determined by two main factors: the cardinality of the columns and the query patterns of your application. Generally, you want the column that is most likely to be used in an equality filter and has the highest selectivity to be the first column in the index. This maximizes the amount of data filtered out at the very first step of the search.
The sequence of columns in a composite index is not interchangeable. A single index on (A, B) is fundamentally different from an index on (B, A) in terms of the queries it can optimize.
Covering Indexes and Index-Only Scans
An index becomes a covering index when it contains all the columns requested by a query. In this scenario, the database engine can retrieve all the necessary data directly from the B-Tree without ever touching the actual table rows. This is known as an index-only scan and is one of the fastest retrieval methods available.
By including a few extra columns in your index that are frequently selected but not used in filters, you can eliminate the overhead of heap fetches. However, this must be balanced against the increased size of the index and the higher maintenance cost. Modern databases like PostgreSQL allow you to use the INCLUDE clause to add non-key columns to an index specifically for this purpose.
