Tutorials Logic, IN info@tutorialslogic.com
Navigation
Home About Us Contact Us Blogs FAQs
Tutorials
All Tutorials
Services
Academic Projects Resume Writing Website Development
Practice
Quiz Challenge Interview Questions Certification Practice
Tools
Online Compiler JSON Formatter Regex Tester CSS Unit Converter Color Picker
Compiler Tools

Normalization 1NF, 2NF, 3NF, BCNF: Tutorial, Examples, FAQs & Interview Tips

Normalization in DBMS

Normalization is the process of organizing data in a relational database to reduce redundancy, avoid anomalies, and improve data integrity. It divides large, poorly structured tables into smaller, well-related tables using rules called normal forms.

A normalized database stores each fact in the right place. For example, a student's name should be stored in a student table, course details should be stored in a course table, and marks should be stored in an enrollment or result table.

Goals of Normalization

  • Reduce duplicate data stored in multiple places.
  • Prevent update, insertion, and deletion anomalies.
  • Keep data dependencies logically correct.
  • Improve data consistency and integrity.
  • Make database structure easier to maintain.
  • Support clean relationships using primary keys and foreign keys.

Problems Without Normalization

A table that stores many different facts together can create redundancy and anomalies. Consider this unnormalized table:

Student_ID Student_Name Course_ID Course_Name Teacher Teacher_Phone Grade
101 Asha C1 DBMS Dr. Rao 90001 A
102 Rahul C1 DBMS Dr. Rao 90001 B
101 Asha C2 Networks Prof. Sen 90002 A-

Types of Anomalies

Anomaly Meaning Example
Update Anomaly The same fact must be updated in many rows. If Dr. Rao changes phone number, every DBMS row must be updated.
Insertion Anomaly Some data cannot be inserted until unrelated data exists. A new course cannot be stored until at least one student enrolls.
Deletion Anomaly Deleting one row accidentally removes other useful facts. If the only student in a course is deleted, course and teacher details may be lost.

Functional Dependency

A functional dependency describes a relationship between attributes. If the value of one attribute or attribute set uniquely determines another attribute, we say there is a functional dependency.

In notation, X -> Y means X determines Y. If two rows have the same value of X, they must have the same value of Y.

Dependency Meaning
Student_ID -> Student_Name Student ID uniquely determines student name.
Course_ID -> Course_Name, Teacher Course ID uniquely determines course name and teacher.
Teacher -> Teacher_Phone Teacher name determines teacher phone number, assuming teacher names are unique in this example.
(Student_ID, Course_ID) -> Grade A student's grade is determined by the student-course combination.

Types of Functional Dependencies

Dependency Type Meaning Example
Trivial Dependency Y is a subset of X in X -> Y. (Student_ID, Name) -> Name
Non-Trivial Dependency Y is not a subset of X. Student_ID -> Name
Full Functional Dependency A non-key attribute depends on the whole composite key, not part of it. (Student_ID, Course_ID) -> Grade
Partial Dependency A non-key attribute depends on only part of a composite key. Student_ID -> Student_Name in a table keyed by Student_ID and Course_ID.
Transitive Dependency A non-key attribute depends on another non-key attribute. Emp_ID -> Dept_ID and Dept_ID -> Dept_Name.

Keys and Prime Attributes

Normalization questions often use key-related terms. These are important before studying normal forms.

Term Meaning
Super Key Any attribute set that uniquely identifies a row.
Candidate Key A minimal super key.
Primary Key The candidate key selected as the main identifier.
Prime Attribute An attribute that is part of any candidate key.
Non-Prime Attribute An attribute that is not part of any candidate key.

Normal Forms Overview

Normal Form Main Rule Removes
1NF All attributes must be atomic; no repeating groups. Multivalued and repeating attributes.
2NF Must be in 1NF and have no partial dependency. Partial dependency on composite key.
3NF Must be in 2NF and have no transitive dependency. Transitive dependency.
BCNF For every non-trivial dependency X -> Y, X must be a super key. Remaining anomalies due to overlapping candidate keys.
4NF Must be in BCNF and have no non-trivial multivalued dependency. Independent multivalued facts in one table.
5NF Must not have non-trivial join dependencies. Complex join dependency problems.

First Normal Form (1NF)

A relation is in First Normal Form if every attribute contains atomic values and there are no repeating groups, arrays, or lists inside a single column.

Before 1NF: Courses column contains multiple values.

Student_ID Student_Name Courses
101 Asha DBMS, OS, Networks
102 Rahul DBMS, Java

After 1NF: Each course appears in a separate row.

Student_ID Student_Name Course
101 Asha DBMS
101 Asha OS
101 Asha Networks
102 Rahul DBMS
102 Rahul Java

Second Normal Form (2NF)

A relation is in Second Normal Form if it is in 1NF and every non-prime attribute is fully functionally dependent on the whole candidate key. 2NF mainly matters when a table has a composite key.

Before 2NF: Student_Name depends only on Student_ID, and Course_Name depends only on Course_ID.

Student_ID Course_ID Student_Name Course_Name Grade
101 C1 Asha DBMS A
101 C2 Asha OS B+
102 C1 Rahul DBMS A-

After 2NF: Split student, course, and enrollment facts.

Student_ID Student_Name
101 Asha
102 Rahul
Course_ID Course_Name
C1 DBMS
C2 OS
Student_ID Course_ID Grade
101 C1 A
101 C2 B+
102 C1 A-

Third Normal Form (3NF)

A relation is in Third Normal Form if it is in 2NF and no non-prime attribute depends on another non-prime attribute. In simple words, non-key attributes should depend only on the key, not on other non-key attributes.

Before 3NF: Dept_Name depends on Dept_ID, not directly on Emp_ID.

Emp_ID Emp_Name Dept_ID Dept_Name Dept_Location
1 Alice D1 Engineering Pune
2 Bob D1 Engineering Pune
3 Charlie D2 Marketing Delhi

After 3NF: Employee and department facts are separated.

Emp_ID Emp_Name Dept_ID
1 Alice D1
2 Bob D1
3 Charlie D2
Dept_ID Dept_Name Dept_Location
D1 Engineering Pune
D2 Marketing Delhi

Boyce-Codd Normal Form (BCNF)

BCNF is a stricter version of 3NF. A relation is in BCNF if, for every non-trivial functional dependency X -> Y, X is a super key.

BCNF handles cases where a table is in 3NF but still has anomalies due to overlapping candidate keys or unusual dependency patterns.

Normal Form Rule Easy Memory Line
3NF For every X -> A, either X is a super key, A is prime, or dependency is trivial. Allows some dependencies where the dependent attribute is prime.
BCNF For every X -> Y, X must be a super key. Every determinant must be a super key.

Fourth Normal Form (4NF)

A relation is in Fourth Normal Form if it is in BCNF and has no non-trivial multivalued dependency. A multivalued dependency occurs when one attribute determines multiple independent sets of values.

Student_ID Skill Hobby
101 Java Music
101 Java Cricket
101 Python Music
101 Python Cricket

Since skills and hobbies are independent facts, they should be stored in separate tables.

Fifth Normal Form (5NF)

Fifth Normal Form, also called Project-Join Normal Form, deals with join dependencies. A table is in 5NF when it cannot be decomposed further into smaller tables without losing information, except when the decomposition is logically necessary.

5NF is less common in beginner-level database design but is important in complex many-to-many-to-many relationships.

Decomposition in Normalization

Decomposition means splitting one relation into two or more relations. Good decomposition should preserve data meaning and allow the original relation to be reconstructed when needed.

Property Meaning Why It Matters
Lossless Decomposition Joining decomposed tables should recreate the original table without extra or missing rows. Prevents information loss or false tuples.
Dependency Preservation Functional dependencies should be enforceable in decomposed tables without unnecessary joins. Keeps constraints easy to validate.

Lossless vs Lossy Decomposition

Type Meaning Result
Lossless Original relation can be reconstructed exactly using joins. Safe and preferred.
Lossy Join creates missing rows, extra rows, or wrong combinations. Unsafe and should be avoided.

SQL Example: Normalized College Tables

The following SQL creates normalized student, course, and enrollment tables. Student details, course details, and grades are stored separately.

normalized-college-schema.sql
CREATE TABLE students (
    student_id INT PRIMARY KEY,
    student_name VARCHAR(100) NOT NULL
);

CREATE TABLE courses (
    course_id VARCHAR(10) PRIMARY KEY,
    course_name VARCHAR(100) NOT NULL
);

CREATE TABLE enrollments (
    student_id INT,
    course_id VARCHAR(10),
    grade VARCHAR(5),
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);

SQL Example: Querying Normalized Tables

In normalized databases, related data is often retrieved using joins.

normalized-query.sql
SELECT
    s.student_id,
    s.student_name,
    c.course_name,
    e.grade
FROM enrollments e
JOIN students s ON e.student_id = s.student_id
JOIN courses c ON e.course_id = c.course_id
ORDER BY s.student_id, c.course_name;

Denormalization

Denormalization is the process of intentionally adding controlled redundancy to improve read performance. It is commonly used in reporting systems, dashboards, analytics, and data warehouses.

Normalization Denormalization
Reduces redundancy. Adds controlled redundancy.
Improves update consistency. Improves read/query performance.
Uses more joins for reports. Reduces joins by storing precombined data.
Best for transactional systems. Often used in analytical systems.

Normalization Step-by-Step Approach

  1. Start with the unnormalized table and list all attributes.
  2. Identify candidate keys and functional dependencies.
  3. Apply 1NF by removing repeating groups and non-atomic values.
  4. Apply 2NF by removing partial dependencies from composite-key tables.
  5. Apply 3NF by removing transitive dependencies.
  6. Check BCNF if there are multiple candidate keys or unusual dependencies.
  7. Check 4NF if independent multivalued facts exist in the same table.
  8. Verify lossless decomposition and dependency preservation.
  9. Add primary keys, foreign keys, unique constraints, and check constraints.
  10. Review query performance and denormalize only when there is a proven need.

Common Mistakes

Mistake Correct Understanding
Thinking 1NF only means having a primary key. 1NF mainly requires atomic values and no repeating groups.
Applying 2NF to a table with a single-column key unnecessarily. Partial dependency matters only when candidate keys are composite.
Confusing partial dependency with transitive dependency. Partial dependency depends on part of a composite key; transitive dependency goes through a non-key attribute.
Splitting tables without checking lossless join. Good decomposition must allow exact reconstruction when joined.
Assuming higher normalization is always faster. Normalization improves integrity, but too many joins can affect read performance.
Denormalizing too early. First design clean normalized tables, then denormalize only for measured performance needs.

Interview and Exam Questions

Question Short Answer
What is normalization? It is the process of organizing relational tables to reduce redundancy and improve integrity.
What are anomalies? Problems caused by poor table design, such as update, insertion, and deletion anomalies.
What is a functional dependency? A relationship where one attribute or attribute set determines another attribute.
What is 1NF? A table is in 1NF when all values are atomic and there are no repeating groups.
What is 2NF? A table is in 2NF when it is in 1NF and has no partial dependency.
What is 3NF? A table is in 3NF when it is in 2NF and has no transitive dependency.
What is BCNF? A stronger form of 3NF where every determinant must be a super key.
What is lossless decomposition? A decomposition where joining decomposed tables recreates the original table exactly.
What is dependency preservation? Functional dependencies can be enforced in decomposed tables without unnecessary joins.
What is denormalization? Intentionally adding redundancy to improve read performance in selected cases.

Quick Revision Notes

  • Normalization reduces redundancy and prevents anomalies.
  • Anomalies include update, insertion, and deletion anomalies.
  • Functional dependency X -> Y means X determines Y.
  • 1NF removes repeating groups and non-atomic values.
  • 2NF removes partial dependency from composite-key tables.
  • 3NF removes transitive dependency.
  • BCNF requires every determinant to be a super key.
  • 4NF removes independent multivalued dependencies.
  • Good decomposition should be lossless and preferably dependency preserving.
  • Denormalization is useful only when controlled redundancy improves performance.
Key Takeaways
  • Normalization organizes relational tables to reduce redundancy and improve data integrity.
  • Poor table design can cause update, insertion, and deletion anomalies.
  • Functional dependencies are the foundation for identifying normal forms.
  • 1NF requires atomic values and no repeating groups.
  • 2NF requires 1NF and no partial dependency.
  • 3NF requires 2NF and no transitive dependency.
  • BCNF requires every determinant to be a super key.
  • 4NF handles independent multivalued dependencies.
  • Good decomposition should be lossless and preferably dependency preserving.
  • Denormalization intentionally adds controlled redundancy for read performance when needed.

Ready to Level Up Your Skills?

Explore 500+ free tutorials across 20+ languages and frameworks.