2024-06-23
As an application grows and the database accumulates more data, queries that used to be fast start slowing down. Database indexing is one of the most effective ways to improve query performance. An index is a data structure that the database maintains alongside the table, allowing it to find rows without scanning every single row.
Without an index, when we run a query like SELECT * FROM users WHERE email = 'david@example.com', the database has to look at every row in the table to find matches. This is called a full table scan, and it gets slower as the table grows.
An index on the email column works like a sorted reference list. The database can quickly locate the relevant rows using a data structure, usually a B-tree, which allows lookups in logarithmic time instead of linear time. For a table with a million rows, an indexed lookup might examine around 20 entries instead of a million.
In SQL, creating an index is straightforward:
CREATE INDEX idx_users_email ON users (email);Most databases automatically create an index on the primary key. For other columns that we frequently query by, we need to add indexes ourselves.
We can also create composite indexes that cover multiple columns:
CREATE INDEX idx_orders_user_date ON orders (user_id, created_at);This index helps queries that filter by user_id, or by both user_id and created_at. The order of columns matters. The index above would not help a query that only filters by created_at.
Indexes are most beneficial on columns that appear in WHERE clauses, JOIN conditions, and ORDER BY clauses. If we notice a query is slow, the first thing to check is whether the columns it filters on are indexed.
Database tools like EXPLAIN or EXPLAIN ANALYZE show how the database plans to execute a query. If we see a sequential scan on a large table, adding an index on the filtered column will likely improve performance significantly.
Indexes are not free. Every index takes up disk space because the database stores the index data structure alongside the table data. More importantly, indexes slow down write operations. When we insert, update, or delete a row, the database has to update all relevant indexes as well.
For tables with heavy write traffic and infrequent reads, too many indexes can hurt overall performance. The goal is to index the columns that are actually queried frequently and avoid indexing columns that rarely appear in query conditions.
Beyond the standard B-tree index, databases offer other types. A unique index enforces that no two rows can have the same value in the indexed column. A partial index only indexes rows that meet a certain condition, which saves space. A GIN index is useful for full-text search or JSONB columns in PostgreSQL. Choosing the right index type depends on the data and the query patterns.