Normalization 1NF, 2NF, 3NF, BCNF: Tutorial, Examples, FAQs & Interview Tips

Normalization in DBMS

Normalization is the process of organizing data in a relational database to reduce redundancy, avoid anomalies, and improve data integrity. It divides large, poorly structured tables into smaller, well-related tables using rules called normal forms.

A normalized database stores each fact in the right place. For example, a student's name should be stored in a student table, course details should be stored in a course table, and marks should be stored in an enrollment or result table.

Simple idea: If the same department name is repeated in hundreds of employee rows, one spelling mistake can create inconsistent data. Normalization separates department details into a department table and links employees using a department ID.

Goals of Normalization

Reduce duplicate data stored in multiple places.
Prevent update, insertion, and deletion anomalies.
Keep data dependencies logically correct.
Improve data consistency and integrity.
Make database structure easier to maintain.
Support clean relationships using primary keys and foreign keys.

Problems Without Normalization

A table that stores many different facts together can create redundancy and anomalies. Consider this unnormalized table:

Student_ID	Student_Name	Course_ID	Course_Name	Teacher	Teacher_Phone	Grade
101	Asha	C1	DBMS	Dr. Rao	90001	A
102	Rahul	C1	DBMS	Dr. Rao	90001	B
101	Asha	C2	Networks	Prof. Sen	90002	A-

Types of Anomalies

Anomaly	Meaning	Example
Update Anomaly	The same fact must be updated in many rows.	If Dr. Rao changes phone number, every DBMS row must be updated.
Insertion Anomaly	Some data cannot be inserted until unrelated data exists.	A new course cannot be stored until at least one student enrolls.
Deletion Anomaly	Deleting one row accidentally removes other useful facts.	If the only student in a course is deleted, course and teacher details may be lost.

Functional Dependency

A functional dependency describes a relationship between attributes. If the value of one attribute or attribute set uniquely determines another attribute, we say there is a functional dependency.

In notation, X -> Y means X determines Y. If two rows have the same value of X, they must have the same value of Y.

Dependency	Meaning
`Student_ID -> Student_Name`	Student ID uniquely determines student name.
`Course_ID -> Course_Name, Teacher`	Course ID uniquely determines course name and teacher.
`Teacher -> Teacher_Phone`	Teacher name determines teacher phone number, assuming teacher names are unique in this example.
`(Student_ID, Course_ID) -> Grade`	A student's grade is determined by the student-course combination.

Types of Functional Dependencies

Dependency Type	Meaning	Example
Trivial Dependency	Y is a subset of X in `X -> Y`.	`(Student_ID, Name) -> Name`
Non-Trivial Dependency	Y is not a subset of X.	`Student_ID -> Name`
Full Functional Dependency	A non-key attribute depends on the whole composite key, not part of it.	`(Student_ID, Course_ID) -> Grade`
Partial Dependency	A non-key attribute depends on only part of a composite key.	`Student_ID -> Student_Name` in a table keyed by Student_ID and Course_ID.
Transitive Dependency	A non-key attribute depends on another non-key attribute.	`Emp_ID -> Dept_ID` and `Dept_ID -> Dept_Name`.

Keys and Prime Attributes

Normalization questions often use key-related terms. These are important before studying normal forms.

Term	Meaning
Super Key	Any attribute set that uniquely identifies a row.
Candidate Key	A minimal super key.
Primary Key	The candidate key selected as the main identifier.
Prime Attribute	An attribute that is part of any candidate key.
Non-Prime Attribute	An attribute that is not part of any candidate key.

Normal Forms Overview

Normal Form	Main Rule	Removes
1NF	All attributes must be atomic; no repeating groups.	Multivalued and repeating attributes.
2NF	Must be in 1NF and have no partial dependency.	Partial dependency on composite key.
3NF	Must be in 2NF and have no transitive dependency.	Transitive dependency.
BCNF	For every non-trivial dependency X -> Y, X must be a super key.	Remaining anomalies due to overlapping candidate keys.
4NF	Must be in BCNF and have no non-trivial multivalued dependency.	Independent multivalued facts in one table.
5NF	Must not have non-trivial join dependencies.	Complex join dependency problems.

First Normal Form (1NF)

A relation is in First Normal Form if every attribute contains atomic values and there are no repeating groups, arrays, or lists inside a single column.

Rule: Each cell should contain exactly one value.

Before 1NF: Courses column contains multiple values.

Student_ID	Student_Name	Courses
101	Asha	DBMS, OS, Networks
102	Rahul	DBMS, Java

After 1NF: Each course appears in a separate row.

Student_ID	Student_Name	Course
101	Asha	DBMS
101	Asha	OS
101	Asha	Networks
102	Rahul	DBMS
102	Rahul	Java

Second Normal Form (2NF)

A relation is in Second Normal Form if it is in 1NF and every non-prime attribute is fully functionally dependent on the whole candidate key. 2NF mainly matters when a table has a composite key.

Rule: Remove partial dependencies from composite-key tables.

Before 2NF: Student_Name depends only on Student_ID, and Course_Name depends only on Course_ID.

Student_ID	Course_ID	Student_Name	Course_Name	Grade
101	C1	Asha	DBMS	A
101	C2	Asha	OS	B+
102	C1	Rahul	DBMS	A-

After 2NF: Split student, course, and enrollment facts.

Student_ID	Student_Name
101	Asha
102	Rahul

Course_ID	Course_Name
C1	DBMS
C2	OS

Student_ID	Course_ID	Grade
101	C1	A
101	C2	B+
102	C1	A-

Third Normal Form (3NF)

A relation is in Third Normal Form if it is in 2NF and no non-prime attribute depends on another non-prime attribute. In simple words, non-key attributes should depend only on the key, not on other non-key attributes.

Rule: Remove transitive dependencies.

Before 3NF: Dept_Name depends on Dept_ID, not directly on Emp_ID.

Emp_ID	Emp_Name	Dept_ID	Dept_Name	Dept_Location
1	Alice	D1	Engineering	Pune
2	Bob	D1	Engineering	Pune
3	Charlie	D2	Marketing	Delhi

After 3NF: Employee and department facts are separated.

Emp_ID	Emp_Name	Dept_ID
1	Alice	D1
2	Bob	D1
3	Charlie	D2

Dept_ID	Dept_Name	Dept_Location
D1	Engineering	Pune
D2	Marketing	Delhi

Boyce-Codd Normal Form (BCNF)

BCNF is a stricter version of 3NF. A relation is in BCNF if, for every non-trivial functional dependency X -> Y, X is a super key.

BCNF handles cases where a table is in 3NF but still has anomalies due to overlapping candidate keys or unusual dependency patterns.

Normal Form	Rule	Easy Memory Line
3NF	For every X -> A, either X is a super key, A is prime, or dependency is trivial.	Allows some dependencies where the dependent attribute is prime.
BCNF	For every X -> Y, X must be a super key.	Every determinant must be a super key.

Fourth Normal Form (4NF)

A relation is in Fourth Normal Form if it is in BCNF and has no non-trivial multivalued dependency. A multivalued dependency occurs when one attribute determines multiple independent sets of values.

Example: If a student can have many skills and many hobbies independently, storing Student_ID, Skill, and Hobby in one table creates unnecessary combinations. Split them into Student_Skill and Student_Hobby.

Student_ID	Skill	Hobby
101	Java	Music
101	Java	Cricket
101	Python	Music
101	Python	Cricket

Since skills and hobbies are independent facts, they should be stored in separate tables.

Fifth Normal Form (5NF)

Fifth Normal Form, also called Project-Join Normal Form, deals with join dependencies. A table is in 5NF when it cannot be decomposed further into smaller tables without losing information, except when the decomposition is logically necessary.

5NF is less common in beginner-level database design but is important in complex many-to-many-to-many relationships.

Decomposition in Normalization

Decomposition means splitting one relation into two or more relations. Good decomposition should preserve data meaning and allow the original relation to be reconstructed when needed.

Property	Meaning	Why It Matters
Lossless Decomposition	Joining decomposed tables should recreate the original table without extra or missing rows.	Prevents information loss or false tuples.
Dependency Preservation	Functional dependencies should be enforceable in decomposed tables without unnecessary joins.	Keeps constraints easy to validate.

Lossless vs Lossy Decomposition

Type	Meaning	Result
Lossless	Original relation can be reconstructed exactly using joins.	Safe and preferred.
Lossy	Join creates missing rows, extra rows, or wrong combinations.	Unsafe and should be avoided.

SQL Example: Normalized College Tables

The following SQL creates normalized student, course, and enrollment tables. Student details, course details, and grades are stored separately.

normalized-college-schema.sql

CREATE TABLE students (
    student_id INT PRIMARY KEY,
    student_name VARCHAR(100) NOT NULL
);

CREATE TABLE courses (
    course_id VARCHAR(10) PRIMARY KEY,
    course_name VARCHAR(100) NOT NULL
);

CREATE TABLE enrollments (
    student_id INT,
    course_id VARCHAR(10),
    grade VARCHAR(5),
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);

SQL Example: Querying Normalized Tables

In normalized databases, related data is often retrieved using joins.

normalized-query.sql

SELECT
    s.student_id,
    s.student_name,
    c.course_name,
    e.grade
FROM enrollments e
JOIN students s ON e.student_id = s.student_id
JOIN courses c ON e.course_id = c.course_id
ORDER BY s.student_id, c.course_name;

Denormalization

Denormalization is the process of intentionally adding controlled redundancy to improve read performance. It is commonly used in reporting systems, dashboards, analytics, and data warehouses.

Normalization	Denormalization
Reduces redundancy.	Adds controlled redundancy.
Improves update consistency.	Improves read/query performance.
Uses more joins for reports.	Reduces joins by storing precombined data.
Best for transactional systems.	Often used in analytical systems.

Normalization Step-by-Step Approach

Start with the unnormalized table and list all attributes.
Identify candidate keys and functional dependencies.
Apply 1NF by removing repeating groups and non-atomic values.
Apply 2NF by removing partial dependencies from composite-key tables.
Apply 3NF by removing transitive dependencies.
Check BCNF if there are multiple candidate keys or unusual dependencies.
Check 4NF if independent multivalued facts exist in the same table.
Verify lossless decomposition and dependency preservation.
Add primary keys, foreign keys, unique constraints, and check constraints.
Review query performance and denormalize only when there is a proven need.

Common Mistakes

Mistake	Correct Understanding
Thinking 1NF only means having a primary key.	1NF mainly requires atomic values and no repeating groups.
Applying 2NF to a table with a single-column key unnecessarily.	Partial dependency matters only when candidate keys are composite.
Confusing partial dependency with transitive dependency.	Partial dependency depends on part of a composite key; transitive dependency goes through a non-key attribute.
Splitting tables without checking lossless join.	Good decomposition must allow exact reconstruction when joined.
Assuming higher normalization is always faster.	Normalization improves integrity, but too many joins can affect read performance.
Denormalizing too early.	First design clean normalized tables, then denormalize only for measured performance needs.

Interview and Exam Questions

Question	Short Answer
What is normalization?	It is the process of organizing relational tables to reduce redundancy and improve integrity.
What are anomalies?	Problems caused by poor table design, such as update, insertion, and deletion anomalies.
What is a functional dependency?	A relationship where one attribute or attribute set determines another attribute.
What is 1NF?	A table is in 1NF when all values are atomic and there are no repeating groups.
What is 2NF?	A table is in 2NF when it is in 1NF and has no partial dependency.
What is 3NF?	A table is in 3NF when it is in 2NF and has no transitive dependency.
What is BCNF?	A stronger form of 3NF where every determinant must be a super key.
What is lossless decomposition?	A decomposition where joining decomposed tables recreates the original table exactly.
What is dependency preservation?	Functional dependencies can be enforced in decomposed tables without unnecessary joins.
What is denormalization?	Intentionally adding redundancy to improve read performance in selected cases.

Quick Revision Notes

Normalization reduces redundancy and prevents anomalies.
Anomalies include update, insertion, and deletion anomalies.
Functional dependency X -> Y means X determines Y.
1NF removes repeating groups and non-atomic values.
2NF removes partial dependency from composite-key tables.
3NF removes transitive dependency.
BCNF requires every determinant to be a super key.
4NF removes independent multivalued dependencies.
Good decomposition should be lossless and preferably dependency preserving.
Denormalization is useful only when controlled redundancy improves performance.

Key Takeaways

Normalization organizes relational tables to reduce redundancy and improve data integrity.
Poor table design can cause update, insertion, and deletion anomalies.
Functional dependencies are the foundation for identifying normal forms.
1NF requires atomic values and no repeating groups.
2NF requires 1NF and no partial dependency.
3NF requires 2NF and no transitive dependency.
BCNF requires every determinant to be a super key.
4NF handles independent multivalued dependencies.
Good decomposition should be lossless and preferably dependency preserving.
Denormalization intentionally adds controlled redundancy for read performance when needed.