Query Processing in DBMS
Steps in Query Processing
When a SQL query is submitted to a DBMS, it goes through several stages before results are returned:
- Parsing: The query is checked for syntax errors and converted into an internal representation (parse tree).
- Translation: The parse tree is translated into a relational algebra expression (query tree).
- Optimization: The query optimizer generates multiple execution plans and selects the most efficient one based on cost estimates.
- Evaluation: The chosen execution plan is executed by the query evaluation engine, and results are returned.
| Stage | Input | Output | Key Activity |
|---|---|---|---|
| Parsing | SQL string | Parse tree | Syntax check, tokenization |
| Translation | Parse tree | Relational algebra expression | Semantic check, schema lookup |
| Optimization | Relational algebra expression | Execution plan | Cost estimation, plan selection |
| Evaluation | Execution plan | Query result | Physical operators, I/O |
Query Tree (Relational Algebra Tree)
A query tree (also called a query evaluation tree) is a tree data structure that represents a relational algebra expression. Leaf nodes are relations (tables), and internal nodes are relational algebra operations (σ, π, ⋈, etc.).
Example: For the query SELECT name FROM students WHERE age > 20:
- Leaf node:
students(relation) - Internal node: σage > 20 (selection)
- Root node: πname (projection)
Heuristic Optimization transforms the query tree to improve efficiency before cost-based analysis:
- Push selections down: Apply σ (selection) as early as possible to reduce the number of tuples.
- Push projections down: Apply π (projection) early to reduce tuple size.
- Combine selections with Cartesian products: Convert σ(R × S) into a join (R ⋈ S).
- Reorder joins: Perform joins that produce smaller intermediate results first.
Cost-Based vs Heuristic Optimization
| Aspect | Heuristic Optimization | Cost-Based Optimization |
|---|---|---|
| Approach | Apply rules (push selections down, etc.) | Estimate cost of multiple plans, pick cheapest |
| Statistics needed | No | Yes (table sizes, index info, cardinality) |
| Quality | Good for simple queries | Better for complex queries |
| Speed | Fast (no cost computation) | Slower (evaluates many plans) |
| Used by | Older/simpler systems | PostgreSQL, Oracle, SQL Server, MySQL |
Join Ordering
For a query joining n tables, there are O(n!) possible join orderings. The optimizer uses dynamic programming or greedy algorithms to find a good order without evaluating all possibilities.
- Left-deep trees: One operand of each join is always a base relation. Allows pipelining — output of one join feeds directly into the next.
- Right-deep trees: One operand is always the result of a previous join. Requires more memory.
- Bushy trees: Both operands can be intermediate results. Most flexible but hardest to optimize.
Key principle: Perform the join that produces the smallest intermediate result first. Use selectivity estimates (from statistics) to predict result sizes.
Execution Plans
An execution plan specifies the exact physical operations to execute a query. You can view it using:
- MySQL:
EXPLAIN SELECT ...orEXPLAIN ANALYZE SELECT ... - PostgreSQL:
EXPLAIN (ANALYZE, BUFFERS) SELECT ... - SQL Server:
SET SHOWPLAN_ALL ONor graphical execution plan in SSMS
| Plan Component | Description |
|---|---|
| Seq Scan | Full table scan — reads every row |
| Index Scan | Uses an index to find rows |
| Index Only Scan | Satisfies query entirely from index (no table access) |
| Nested Loop Join | For each row in outer table, scan inner table. Good for small tables. |
| Hash Join | Build hash table from smaller relation, probe with larger. Good for large unsorted tables. |
| Merge Join | Both inputs sorted on join key. Very efficient for sorted data. |
| Sort | Sorts rows for ORDER BY or merge join |
| Aggregate | Computes GROUP BY, COUNT, SUM, etc. |
Ready to Level Up Your Skills?
Explore 500+ free tutorials across 20+ languages and frameworks.