Normalization in SQL – Learn about the process of Normalization in SQL and how it helps improve your database’s structure. This article provides an overview of the different types of Normalization, as well as tips on how to use it to optimize your database.
Before moving ahead let’s know a bit about Generalization & Specialization in DBMS.
Table of Contents
What is Normalization?
Normalization is the process of organizing data systematically in a database by reducing the data redundancy through removing the duplicate data. It stores the data into tabular form and removes duplicates to clean up the data and to improve the data integrity in the table.
Important Points:
- Normalization has one of the main tasks is to organize the data systematically.
- It is mainly in use for reducing data redundancy.
- It breaks big data tables into small sets of data and stores them into a table.
- It eliminates the duplication of data to improve the data integrity.
- It is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
Why do we need Normalization?
To convert the big relations data into the small set of data relations and to remove the anomalies i.e., insertion, updation and deletion that makes data difficult to read & analyze. To clean-up the data and manage it with fewer attributes to increase the searching efficiency of the data into the database.
Data Anomalies
Insertion
Insertion refers to the process of being unable to insert new data into the database.
Updation
Updation refers to the process of updation of a data required to be updated the multiple rows of data.
Deletion
Deletion refers to the process of removing/deletion of data causing loss of other important data from the database.
Key
A Key in the table is defined as the unique column name which helps to identify the record uniquely in the table. It can be one or more combinations of columns.
Department | Dept_Type | Dept_No |
D1 | A+ | A12 |
D2 | B- | B45 |
D3 | AA+ | H90 |
Where Department Column is Primary Key and Dept_No is Foreign Key.
Type of Keys
Primary Key
A Primary Key is the set of one or more column names which identify the records uniquely in the table. It does not contain null and duplicates.
Example:
ID | Roll_No | Name | Subject |
D1 | 11 | A | Math |
D2 | 22 | A | Eco |
D3 | 33 | H | Sci |
D4 | 44 | R | Math |
D5 | 55 | T | Eco |
In the above table, column ID and column Roll_No can be Primary Key because both columns follow the rule of Primary Key but the column Name and Subject cannot be Primary Key as there could be more than one student who is having the same Name and Subject.
Foreign Key
A Foreign key is a field in a table which is a Primary key in another table.
ID | Roll_No | Name | Subject_ID |
D1 | 11 | A | 23a |
D2 | 22 | A | er5 |
D3 | 33 | H | gs2 |
D4 | 44 | R | gc2 |
D5 | 55 | T | ht1 |
In the above table, the column Subject_ID is Foreign Key.
Subject_ID | Subject_Name |
23a | Math |
er5 | Eng |
gs2 | Sci |
gc2 | Hindi |
ht1 | Geography |
But in the above table the same column Subject_ID is a Primary Key.
Super Key
Super Key is the set of one or more than one key that can be used to identify the record uniquely in the table.
Super keys can have extra attributes that are redundant for distinct identification.
ID | Roll_No | Name | Subject |
D1 | 11 | A | Math |
D2 | 22 | A | Eco |
D3 | 33 | H | Sci |
D4 | 44 | R | Math |
D5 | 55 | T | Eco |
Super Keys are given below:
[ID, Name]
[Roll_No, Name]
[Roll_No, Subject]
[ID, Name, Subject]
[Roll_No, Name, Subject]
Candidate Key
A Candidate Key is a subset of a super key set where the key which contains no redundant attribute is none other than a Candidate Key. In other words, Candidate Key is a single key or a group of multiple keys that uniquely identify rows in a table. The value for the Candidate key is unique and non-null for all tuples.
ID | Roll_No | Name | Subject |
D1 | 11 | A | Math |
D2 | 22 | A | Eco |
D3 | 33 | H | Sci |
D4 | 44 | R | Math |
D5 | 55 | T | Eco |
Candidate Keys are given below:
[ID, Roll_No]
[ID, Name]
[Roll_No, Name, Subject]
[ID, Name, Subject]
Types of Normal Forms
1st Normal Form (1NF)
This Normal Form is known for dealing with atomicity i.e., values in the table should not be further divided. In other words, a single row cannot hold the multiple values and must contain the unique value in the table.
Name | Class | Roll No | Subject |
A | I | 01 | Math |
B | II | 02 03 | Eco Sci |
C | III | 04 | Eng |
In the above table, we can see clearly that columns Roll No and Subject contain multiple values that violate the 1st NF. Now if we apply the rule of 1st NF, see the table below:
Name | Class | Roll No | Subject |
A | I | 01 | Math |
B | II | 02 | Eco |
B | II | 03 | Sci |
C | III | 04 | Eng |
Now after applying the 1st NF, we have received a table containing a unique value in each row.
2nd Normal Form (2NF)
The primary condition of 2nd NF is that it should contain the 1st NF and in addition, it should not contain partial dependency i.e., non-prime attributes should not be dependent on any candidate key’s proper subset.
City_ID | City_Name | Measurement (in KMS) |
A1u | New Delhi | 500 |
B5u | NY | 700 |
B8e | LA | 800 |
C9d | Texas | 204 |
{City_ID and City_Name} → Candidate Key
{Measurement (in KMS)} → Non-prime attribute
In the above table, the primary keys are City_ID and City_Name. The non-key attribute is
Measurement (in KMS). In this case the Measurement (in KMS) depends on only the City_Name primary key. This is why the table doesn’t satisfy the 2nd NF.
To apply the 2nd NF in this table, we need to break the table into two parts.
City_ID | City_Name |
A1u | New Delhi |
B5u | NY |
B8e | LA |
C9d | Texas |
City_Name | Measurement (in KMS) |
New Delhi | 500 |
NY | 700 |
LA | 800 |
Texas | 204 |
As you can see we have removed the partial functional dependency that we initially had. Now, in the table, the column Measurement (in KMS) is fully dependent on the primary key (City_Name) of that table.
3rd Normal Form (3NF)
Before proceeding toward 3rd Normal Form, data should contain 2nd NF in the table. In addition to fulfilling the condition of 3rd NF, data should not contain functional transitive dependency for non-prime attributes i.e. There should not be any dependency of any non-prime attributes on other non-prime attributes in the table. So a functional transitive dependency is in which X → Z (X determines Z) indirectly, by sequence of X → Y and Y → Z (where it is not the case that Y → X).
Transaction Date | Transaction_ID | Transaction Amount* | Transaction Tax |
01-Jan-23 | yghy_y1w_qaw | 1200 | @18% |
02-Jan-23 | ubgf_out_opz | 7649 | @20% |
03-Jan-23 | lgvn_jtf_cea | 6319 | @14% |
*without tax amount
In the above table, Transaction Date = Primary Key. But, Transaction Amount* → Transaction Tax. This implies that we have a transitive functional dependency, and this structure does not satisfy the third normal form.
Now in order to apply the 3rd NF, we need to divide the table as following:
Transaction Date | Transaction_ID |
01-Jan-23 | yghy_y1w_qaw |
02-Jan-23 | ubgf_out_opz |
03-Jan-23 | lgvn_jtf_cea |
Transaction Amount* | Transaction Tax |
1200 | @18% |
7649 | @20% |
6319 | @14% |
*without tax amount
As you can see from the above tables all the non-key attributes are now fully functional dependent only on the primary key. Now, Transaction Tax is only dependent on Transaction Amount*.
Boyce Codd Normal Form (BCNF)
Before proceeding for BCNF, the table should be followed by the 3rd NF. If each functional dependency entry shows that X → Y, where X is a Super Key.
Emp_ID |
Country |
Department |
Dept_Type |
Dept_No |
0121 |
India |
D1 |
A+ |
A12 |
0231 |
USA |
D2 |
B- |
B45 |
0341 |
Germany |
D3 |
AA+ |
H90 |
Functional Dependencies
- Emp_ID → Country
- Department → Dept_Type
- Department → Dept_No
- Emp_ID → Department
{Emp_ID and Department} are the candidate keys.
Emp_ID | Country |
0121 | India |
0231 | USA |
0341 | Germany |
In the above table, Emp_ID → Country. This implies that we have a functional dependency, where X → Y and X (Emp_ID) is candidate key.
Department | Dept_Type | Dept_No |
D1 | A+ | A12 |
D2 | B- | B45 |
D3 | AA+ | H90 |
In the above table, Department → Department_Type and Dept_Type → Dept_No. This implies that we have a functional dependency, where X = Y = Z, in which X is candidate key of Y and Y is a candidate key of Z.
Emp_ID | Department |
0121 | D1 |
0231 | D2 |
0341 | D3 |
In the above table, Emp_ID → Department. This implies that we have a functional dependency, where X → Y and X (Emp_ID) is candidate key.
LHS of each functional dependency should be a candidate or super key, for the table to be in the BCNF.
Please comment below if you need help with something in the above-discussed topic and have further questions.
Connect on: