Normalization is the process of organizing data in a relational database to reduce redundancy, avoid anomalies, and improve data integrity. It divides large, poorly structured tables into smaller, well-related tables using rules called normal forms.
A normalized database stores each fact in the right place. For example, a student's name should be stored in a student table, course details should be stored in a course table, and marks should be stored in an enrollment or result table.
A table that stores many different facts together can create redundancy and anomalies. Consider this unnormalized table:
| Student_ID | Student_Name | Course_ID | Course_Name | Teacher | Teacher_Phone | Grade |
|---|---|---|---|---|---|---|
| 101 | Asha | C1 | DBMS | Dr. Rao | 90001 | A |
| 102 | Rahul | C1 | DBMS | Dr. Rao | 90001 | B |
| 101 | Asha | C2 | Networks | Prof. Sen | 90002 | A- |
| Anomaly | Meaning | Example |
|---|---|---|
| Update Anomaly | The same fact must be updated in many rows. | If Dr. Rao changes phone number, every DBMS row must be updated. |
| Insertion Anomaly | Some data cannot be inserted until unrelated data exists. | A new course cannot be stored until at least one student enrolls. |
| Deletion Anomaly | Deleting one row accidentally removes other useful facts. | If the only student in a course is deleted, course and teacher details may be lost. |
A functional dependency describes a relationship between attributes. If the value of one attribute or attribute set uniquely determines another attribute, we say there is a functional dependency.
In notation, X -> Y means X determines Y. If two rows have the same value of X, they must have the same value of Y.
| Dependency | Meaning |
|---|---|
Student_ID -> Student_Name |
Student ID uniquely determines student name. |
Course_ID -> Course_Name, Teacher |
Course ID uniquely determines course name and teacher. |
Teacher -> Teacher_Phone |
Teacher name determines teacher phone number, assuming teacher names are unique in this example. |
(Student_ID, Course_ID) -> Grade |
A student's grade is determined by the student-course combination. |
| Dependency Type | Meaning | Example |
|---|---|---|
| Trivial Dependency | Y is a subset of X in X -> Y. |
(Student_ID, Name) -> Name |
| Non-Trivial Dependency | Y is not a subset of X. | Student_ID -> Name |
| Full Functional Dependency | A non-key attribute depends on the whole composite key, not part of it. | (Student_ID, Course_ID) -> Grade |
| Partial Dependency | A non-key attribute depends on only part of a composite key. | Student_ID -> Student_Name in a table keyed by Student_ID and Course_ID. |
| Transitive Dependency | A non-key attribute depends on another non-key attribute. | Emp_ID -> Dept_ID and Dept_ID -> Dept_Name. |
Normalization questions often use key-related terms. These are important before studying normal forms.
| Term | Meaning |
|---|---|
| Super Key | Any attribute set that uniquely identifies a row. |
| Candidate Key | A minimal super key. |
| Primary Key | The candidate key selected as the main identifier. |
| Prime Attribute | An attribute that is part of any candidate key. |
| Non-Prime Attribute | An attribute that is not part of any candidate key. |
| Normal Form | Main Rule | Removes |
|---|---|---|
| 1NF | All attributes must be atomic; no repeating groups. | Multivalued and repeating attributes. |
| 2NF | Must be in 1NF and have no partial dependency. | Partial dependency on composite key. |
| 3NF | Must be in 2NF and have no transitive dependency. | Transitive dependency. |
| BCNF | For every non-trivial dependency X -> Y, X must be a super key. | Remaining anomalies due to overlapping candidate keys. |
| 4NF | Must be in BCNF and have no non-trivial multivalued dependency. | Independent multivalued facts in one table. |
| 5NF | Must not have non-trivial join dependencies. | Complex join dependency problems. |
A relation is in First Normal Form if every attribute contains atomic values and there are no repeating groups, arrays, or lists inside a single column.
Before 1NF: Courses column contains multiple values.
| Student_ID | Student_Name | Courses |
|---|---|---|
| 101 | Asha | DBMS, OS, Networks |
| 102 | Rahul | DBMS, Java |
After 1NF: Each course appears in a separate row.
| Student_ID | Student_Name | Course |
|---|---|---|
| 101 | Asha | DBMS |
| 101 | Asha | OS |
| 101 | Asha | Networks |
| 102 | Rahul | DBMS |
| 102 | Rahul | Java |
A relation is in Second Normal Form if it is in 1NF and every non-prime attribute is fully functionally dependent on the whole candidate key. 2NF mainly matters when a table has a composite key.
Before 2NF: Student_Name depends only on Student_ID, and Course_Name depends only on Course_ID.
| Student_ID | Course_ID | Student_Name | Course_Name | Grade |
|---|---|---|---|---|
| 101 | C1 | Asha | DBMS | A |
| 101 | C2 | Asha | OS | B+ |
| 102 | C1 | Rahul | DBMS | A- |
After 2NF: Split student, course, and enrollment facts.
| Student_ID | Student_Name |
|---|---|
| 101 | Asha |
| 102 | Rahul |
| Course_ID | Course_Name |
|---|---|
| C1 | DBMS |
| C2 | OS |
| Student_ID | Course_ID | Grade |
|---|---|---|
| 101 | C1 | A |
| 101 | C2 | B+ |
| 102 | C1 | A- |
A relation is in Third Normal Form if it is in 2NF and no non-prime attribute depends on another non-prime attribute. In simple words, non-key attributes should depend only on the key, not on other non-key attributes.
Before 3NF: Dept_Name depends on Dept_ID, not directly on Emp_ID.
| Emp_ID | Emp_Name | Dept_ID | Dept_Name | Dept_Location |
|---|---|---|---|---|
| 1 | Alice | D1 | Engineering | Pune |
| 2 | Bob | D1 | Engineering | Pune |
| 3 | Charlie | D2 | Marketing | Delhi |
After 3NF: Employee and department facts are separated.
| Emp_ID | Emp_Name | Dept_ID |
|---|---|---|
| 1 | Alice | D1 |
| 2 | Bob | D1 |
| 3 | Charlie | D2 |
| Dept_ID | Dept_Name | Dept_Location |
|---|---|---|
| D1 | Engineering | Pune |
| D2 | Marketing | Delhi |
BCNF is a stricter version of 3NF. A relation is in BCNF if, for every non-trivial functional dependency X -> Y, X is a super key.
BCNF handles cases where a table is in 3NF but still has anomalies due to overlapping candidate keys or unusual dependency patterns.
| Normal Form | Rule | Easy Memory Line |
|---|---|---|
| 3NF | For every X -> A, either X is a super key, A is prime, or dependency is trivial. | Allows some dependencies where the dependent attribute is prime. |
| BCNF | For every X -> Y, X must be a super key. | Every determinant must be a super key. |
A relation is in Fourth Normal Form if it is in BCNF and has no non-trivial multivalued dependency. A multivalued dependency occurs when one attribute determines multiple independent sets of values.
| Student_ID | Skill | Hobby |
|---|---|---|
| 101 | Java | Music |
| 101 | Java | Cricket |
| 101 | Python | Music |
| 101 | Python | Cricket |
Since skills and hobbies are independent facts, they should be stored in separate tables.
Fifth Normal Form, also called Project-Join Normal Form, deals with join dependencies. A table is in 5NF when it cannot be decomposed further into smaller tables without losing information, except when the decomposition is logically necessary.
5NF is less common in beginner-level database design but is important in complex many-to-many-to-many relationships.
Decomposition means splitting one relation into two or more relations. Good decomposition should preserve data meaning and allow the original relation to be reconstructed when needed.
| Property | Meaning | Why It Matters |
|---|---|---|
| Lossless Decomposition | Joining decomposed tables should recreate the original table without extra or missing rows. | Prevents information loss or false tuples. |
| Dependency Preservation | Functional dependencies should be enforceable in decomposed tables without unnecessary joins. | Keeps constraints easy to validate. |
| Type | Meaning | Result |
|---|---|---|
| Lossless | Original relation can be reconstructed exactly using joins. | Safe and preferred. |
| Lossy | Join creates missing rows, extra rows, or wrong combinations. | Unsafe and should be avoided. |
The following SQL creates normalized student, course, and enrollment tables. Student details, course details, and grades are stored separately.
CREATE TABLE students (
student_id INT PRIMARY KEY,
student_name VARCHAR(100) NOT NULL
);
CREATE TABLE courses (
course_id VARCHAR(10) PRIMARY KEY,
course_name VARCHAR(100) NOT NULL
);
CREATE TABLE enrollments (
student_id INT,
course_id VARCHAR(10),
grade VARCHAR(5),
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
In normalized databases, related data is often retrieved using joins.
SELECT
s.student_id,
s.student_name,
c.course_name,
e.grade
FROM enrollments e
JOIN students s ON e.student_id = s.student_id
JOIN courses c ON e.course_id = c.course_id
ORDER BY s.student_id, c.course_name;
Denormalization is the process of intentionally adding controlled redundancy to improve read performance. It is commonly used in reporting systems, dashboards, analytics, and data warehouses.
| Normalization | Denormalization |
|---|---|
| Reduces redundancy. | Adds controlled redundancy. |
| Improves update consistency. | Improves read/query performance. |
| Uses more joins for reports. | Reduces joins by storing precombined data. |
| Best for transactional systems. | Often used in analytical systems. |
| Mistake | Correct Understanding |
|---|---|
| Thinking 1NF only means having a primary key. | 1NF mainly requires atomic values and no repeating groups. |
| Applying 2NF to a table with a single-column key unnecessarily. | Partial dependency matters only when candidate keys are composite. |
| Confusing partial dependency with transitive dependency. | Partial dependency depends on part of a composite key; transitive dependency goes through a non-key attribute. |
| Splitting tables without checking lossless join. | Good decomposition must allow exact reconstruction when joined. |
| Assuming higher normalization is always faster. | Normalization improves integrity, but too many joins can affect read performance. |
| Denormalizing too early. | First design clean normalized tables, then denormalize only for measured performance needs. |
| Question | Short Answer |
|---|---|
| What is normalization? | It is the process of organizing relational tables to reduce redundancy and improve integrity. |
| What are anomalies? | Problems caused by poor table design, such as update, insertion, and deletion anomalies. |
| What is a functional dependency? | A relationship where one attribute or attribute set determines another attribute. |
| What is 1NF? | A table is in 1NF when all values are atomic and there are no repeating groups. |
| What is 2NF? | A table is in 2NF when it is in 1NF and has no partial dependency. |
| What is 3NF? | A table is in 3NF when it is in 2NF and has no transitive dependency. |
| What is BCNF? | A stronger form of 3NF where every determinant must be a super key. |
| What is lossless decomposition? | A decomposition where joining decomposed tables recreates the original table exactly. |
| What is dependency preservation? | Functional dependencies can be enforced in decomposed tables without unnecessary joins. |
| What is denormalization? | Intentionally adding redundancy to improve read performance in selected cases. |
Explore 500+ free tutorials across 20+ languages and frameworks.