Indexing in DBMS
Why Indexing?
Without an index, a database must scan every row in a table to find matching records (full table scan). For large tables with millions of rows, this is extremely slow. An index is a data structure that allows the database to find rows much faster — similar to a book's index that lets you jump directly to a topic instead of reading every page.
Trade-offs:
- Indexes speed up SELECT queries (reads)
- Indexes slow down INSERT, UPDATE, DELETE (writes) because the index must be updated
- Indexes consume additional disk space
Dense vs Sparse Index
| Type | Description | Pros | Cons |
|---|---|---|---|
| Dense Index | An index entry for every record in the data file | Faster search (direct lookup) | More storage space |
| Sparse Index | Index entries only for some records (e.g., one per block) | Less storage space | Requires sequential scan within block |
Types of Indexes
| Index Type | Description | Use Case |
|---|---|---|
| Primary Index | Built on the primary key. Data file is ordered by the key. One entry per block (sparse). | Fast lookup by primary key |
| Secondary Index | Built on non-primary key attributes. Data file may not be ordered by this key. Dense index. | Fast lookup by non-key attributes |
| Clustering Index | Built on a non-key attribute that orders the data file. One entry per distinct value. | Range queries on ordered non-key fields |
| Unique Index | Ensures all values in the indexed column are unique. | Enforce uniqueness on non-PK columns |
| Composite Index | Index on multiple columns. | Queries filtering on multiple columns |
| Full-Text Index | Optimized for text search operations. | LIKE queries, text search |
B-Tree and B+ Tree Index
The most common index structure in relational databases is the B+ Tree:
- B-Tree: A balanced tree where each node can have multiple keys and children. Data pointers exist at all levels (internal and leaf nodes).
- B+ Tree: An enhanced B-Tree where:
- All data pointers are stored only in leaf nodes
- Internal nodes contain only keys (for routing)
- Leaf nodes are linked in a doubly-linked list (enables efficient range queries)
- All leaf nodes are at the same level (balanced)
Why B+ Tree is preferred:
- Range queries are efficient (traverse linked leaf nodes)
- Internal nodes can hold more keys (no data pointers), so the tree is shorter
- Fewer disk I/Os for most queries
- Used by MySQL (InnoDB), PostgreSQL, Oracle, SQL Server
Hash Index
A hash index uses a hash function to map key values to bucket addresses. It provides O(1) average-case lookup for equality queries.
- Pros: Very fast for equality searches (WHERE id = 5)
- Cons: Cannot support range queries (WHERE id > 5), ordering, or LIKE queries
- Use case: Hash joins, in-memory tables (MySQL MEMORY engine)
Bitmap Index
A bitmap index uses a bit array (bitmap) for each distinct value of an attribute. Each bit represents whether a row has that value.
- Best for: Low-cardinality columns (few distinct values) like Gender (M/F), Status (Active/Inactive)
- Pros: Very compact, fast for AND/OR operations
- Cons: Poor for high-cardinality columns, expensive to update
- Used in: Data warehouses, OLAP systems (Oracle, PostgreSQL)
Index Best Practices
- Index columns used frequently in WHERE, JOIN, and ORDER BY clauses
- Avoid indexing columns with very low cardinality (e.g., boolean columns)
- Use composite indexes for queries that filter on multiple columns
- The leftmost column of a composite index must be in the WHERE clause for the index to be used
- Don't over-index — each index slows down writes
- Use EXPLAIN/EXPLAIN ANALYZE to check if queries use indexes
Ready to Level Up Your Skills?
Explore 500+ free tutorials across 20+ languages and frameworks.