Normalization — 1NF, 2NF, 3NF, BCNF Explained
What is Normalization?
Normalization is the process of organizing a relational database to reduce data redundancy and improve data integrity. It involves decomposing tables into smaller, well-structured tables and defining relationships between them. The process follows a series of rules called Normal Forms (NF).
Goals of Normalization:
- Eliminate redundant data (same data stored in multiple places)
- Ensure data dependencies make sense (only storing related data in a table)
- Prevent update, insertion, and deletion anomalies
Functional Dependencies
A functional dependency X -> Y means that the value of X uniquely determines the value of Y. If two tuples have the same value for X, they must have the same value for Y.
- Full Functional Dependency: Y depends on the entire composite key X, not just part of it.
- Partial Dependency: Y depends on only part of a composite key.
- Transitive Dependency: X -> Y -> Z (Z depends on X through Y, where Y is not a key).
First Normal Form (1NF)
Rule: All attributes must contain atomic (indivisible) values. No repeating groups or arrays.
Before 1NF (violates atomicity):
| Student_ID | Name | Courses |
|---|---|---|
| 1 | Alice | Math, Physics, Chemistry |
| 2 | Bob | Math, Biology |
After 1NF:
| Student_ID | Name | Course |
|---|---|---|
| 1 | Alice | Math |
| 1 | Alice | Physics |
| 1 | Alice | Chemistry |
| 2 | Bob | Math |
| 2 | Bob | Biology |
Second Normal Form (2NF)
Rule: Must be in 1NF AND every non-key attribute must be fully functionally dependent on the entire primary key (no partial dependencies). Applies only when the primary key is composite.
Before 2NF (partial dependency: Student_Name depends only on Student_ID, not on the full key {Student_ID, Course_ID}):
| Student_ID (PK) | Course_ID (PK) | Student_Name | Course_Name | Grade |
|---|---|---|---|---|
| 1 | C1 | Alice | Math | A |
| 1 | C2 | Alice | Physics | B |
| 2 | C1 | Bob | Math | A- |
After 2NF (decompose into separate tables):
| Student_ID (PK) | Student_Name |
|---|---|
| 1 | Alice |
| 2 | Bob |
| Course_ID (PK) | Course_Name |
|---|---|
| C1 | Math |
| C2 | Physics |
| Student_ID (PK) | Course_ID (PK) | Grade |
|---|---|---|
| 1 | C1 | A |
| 1 | C2 | B |
| 2 | C1 | A- |
Third Normal Form (3NF)
Rule: Must be in 2NF AND no non-key attribute should be transitively dependent on the primary key.
Before 3NF (transitive dependency: Dept_Name depends on Dept_ID, which depends on Emp_ID):
| Emp_ID (PK) | Emp_Name | Dept_ID | Dept_Name |
|---|---|---|---|
| 1 | Alice | D1 | Engineering |
| 2 | Bob | D1 | Engineering |
| 3 | Charlie | D2 | Marketing |
After 3NF:
| Emp_ID (PK) | Emp_Name | Dept_ID (FK) |
|---|---|---|
| 1 | Alice | D1 |
| 2 | Bob | D1 |
| 3 | Charlie | D2 |
| Dept_ID (PK) | Dept_Name |
|---|---|
| D1 | Engineering |
| D2 | Marketing |
BCNF, 4NF, and Denormalization
Boyce-Codd Normal Form (BCNF): A stricter version of 3NF. For every functional dependency X -> Y, X must be a super key. BCNF handles anomalies that 3NF misses when there are multiple overlapping candidate keys.
Fourth Normal Form (4NF): Must be in BCNF AND have no multi-valued dependencies. A multi-valued dependency occurs when one attribute determines a set of values for another attribute, independent of other attributes.
Denormalization: The process of intentionally introducing redundancy into a normalized database to improve read performance. Used in data warehouses and reporting systems where query speed is more important than update efficiency.
Level Up Your Dbms Skills
Master Dbms with these hand-picked resources