Normalization in DBMS
What is Normalization?
Normalization is the process of organizing a relational database to reduce data redundancy and improve data integrity. It involves decomposing tables into smaller, well-structured tables and defining relationships between them. The process follows a series of rules called Normal Forms (NF).
Goals of Normalization:
- Eliminate redundant data (same data stored in multiple places)
- Ensure data dependencies make sense (only storing related data in a table)
- Prevent update, insertion, and deletion anomalies
Functional Dependencies
A functional dependency X → Y means that the value of X uniquely determines the value of Y. If two tuples have the same value for X, they must have the same value for Y.
- Full Functional Dependency: Y depends on the entire composite key X, not just part of it.
- Partial Dependency: Y depends on only part of a composite key.
- Transitive Dependency: X → Y → Z (Z depends on X through Y, where Y is not a key).
First Normal Form (1NF)
Rule: All attributes must contain atomic (indivisible) values. No repeating groups or arrays.
Before 1NF (violates atomicity):
| Student_ID | Name | Courses |
|---|---|---|
| 1 | Alice | Math, Physics, Chemistry |
| 2 | Bob | Math, Biology |
After 1NF:
| Student_ID | Name | Course |
|---|---|---|
| 1 | Alice | Math |
| 1 | Alice | Physics |
| 1 | Alice | Chemistry |
| 2 | Bob | Math |
| 2 | Bob | Biology |
Second Normal Form (2NF)
Rule: Must be in 1NF AND every non-key attribute must be fully functionally dependent on the entire primary key (no partial dependencies). Applies only when the primary key is composite.
Before 2NF (partial dependency: Student_Name depends only on Student_ID, not on the full key {Student_ID, Course_ID}):
| Student_ID (PK) | Course_ID (PK) | Student_Name | Course_Name | Grade |
|---|---|---|---|---|
| 1 | C1 | Alice | Math | A |
| 1 | C2 | Alice | Physics | B |
| 2 | C1 | Bob | Math | A- |
After 2NF (decompose into separate tables):
| Student_ID (PK) | Student_Name |
|---|---|
| 1 | Alice |
| 2 | Bob |
| Course_ID (PK) | Course_Name |
|---|---|
| C1 | Math |
| C2 | Physics |
| Student_ID (PK) | Course_ID (PK) | Grade |
|---|---|---|
| 1 | C1 | A |
| 1 | C2 | B |
| 2 | C1 | A- |
Third Normal Form (3NF)
Rule: Must be in 2NF AND no non-key attribute should be transitively dependent on the primary key.
Before 3NF (transitive dependency: Dept_Name depends on Dept_ID, which depends on Emp_ID):
| Emp_ID (PK) | Emp_Name | Dept_ID | Dept_Name |
|---|---|---|---|
| 1 | Alice | D1 | Engineering |
| 2 | Bob | D1 | Engineering |
| 3 | Charlie | D2 | Marketing |
After 3NF:
| Emp_ID (PK) | Emp_Name | Dept_ID (FK) |
|---|---|---|
| 1 | Alice | D1 |
| 2 | Bob | D1 |
| 3 | Charlie | D2 |
| Dept_ID (PK) | Dept_Name |
|---|---|
| D1 | Engineering |
| D2 | Marketing |
BCNF, 4NF, and Denormalization
Boyce-Codd Normal Form (BCNF): A stricter version of 3NF. For every functional dependency X → Y, X must be a super key. BCNF handles anomalies that 3NF misses when there are multiple overlapping candidate keys.
Fourth Normal Form (4NF): Must be in BCNF AND have no multi-valued dependencies. A multi-valued dependency occurs when one attribute determines a set of values for another attribute, independent of other attributes.
Denormalization: The process of intentionally introducing redundancy into a normalized database to improve read performance. Used in data warehouses and reporting systems where query speed is more important than update efficiency.
Ready to Level Up Your Skills?
Explore 500+ free tutorials across 20+ languages and frameworks.