Introduction to Normalization in SQL

February 6, 2023

Normalization in SQL – Learn about the process of Normalization in SQL and how it helps improve your database’s structure. This article provides an overview of the different types of Normalization, as well as tips on how to use it to optimize your database.

Before moving ahead let’s know a bit about Generalization & Specialization in DBMS.

What is Normalization?

Normalization is the process of organizing data systematically in a database by reducing the data redundancy through removing the duplicate data. It stores the data into tabular form and removes duplicates to clean up the data and to improve the data integrity in the table.

Important Points:

Normalization has one of the main tasks is to organize the data systematically.
It is mainly in use for reducing data redundancy.
It breaks big data tables into small sets of data and stores them into a table.
It eliminates the duplication of data to improve the data integrity.
It is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.

Why do we need Normalization?

To convert the big relations data into the small set of data relations and to remove the anomalies i.e., insertion, updation and deletion that makes data difficult to read & analyze. To clean-up the data and manage it with fewer attributes to increase the searching efficiency of the data into the database.

Data Anomalies

Insertion

Insertion refers to the process of being unable to insert new data into the database.

Updation

Updation refers to the process of updation of a data required to be updated the multiple rows of data.

Deletion

Deletion refers to the process of removing/deletion of data causing loss of other important data from the database.

Key

A Key in the table is defined as the unique column name which helps to identify the record uniquely in the table. It can be one or more combinations of columns.

Department	Dept_Type	Dept_No
D1	A+	A12
D2	B-	B45
D3	AA+	H90

Where Department Column is Primary Key and Dept_No is Foreign Key.

Type of Keys

Primary Key

A Primary Key is the set of one or more column names which identify the records uniquely in the table. It does not contain null and duplicates.

Example:

ID	Roll_No	Name	Subject
D1	11	A	Math
D2	22	A	Eco
D3	33	H	Sci
D4	44	R	Math
D5	55	T	Eco

In the above table, column ID and column Roll_No can be Primary Key because both columns follow the rule of Primary Key but the column Name and Subject cannot be Primary Key as there could be more than one student who is having the same Name and Subject.

Foreign Key

A Foreign key is a field in a table which is a Primary key in another table.

ID	Roll_No	Name	Subject_ID
D1	11	A	23a
D2	22	A	er5
D3	33	H	gs2
D4	44	R	gc2
D5	55	T	ht1

In the above table, the column Subject_ID is Foreign Key.

Subject_ID	Subject_Name
23a	Math
er5	Eng
gs2	Sci
gc2	Hindi
ht1	Geography

But in the above table the same column Subject_ID is a Primary Key.

Super Key

Super Key is the set of one or more than one key that can be used to identify the record uniquely in the table.

Super keys can have extra attributes that are redundant for distinct identification.

ID	Roll_No	Name	Subject
D1	11	A	Math
D2	22	A	Eco
D3	33	H	Sci
D4	44	R	Math
D5	55	T	Eco

Super Keys are given below:

[ID, Name]

[Roll_No, Name]

[Roll_No, Subject]

[ID, Name, Subject]

[Roll_No, Name, Subject]

Candidate Key

A Candidate Key is a subset of a super key set where the key which contains no redundant attribute is none other than a Candidate Key. In other words, Candidate Key is a single key or a group of multiple keys that uniquely identify rows in a table. The value for the Candidate key is unique and non-null for all tuples.

ID	Roll_No	Name	Subject
D1	11	A	Math
D2	22	A	Eco
D3	33	H	Sci
D4	44	R	Math
D5	55	T	Eco

Candidate Keys are given below:

[ID, Roll_No]

[ID, Name]

[Roll_No, Name, Subject]

[ID, Name, Subject]

Types of Normal Forms

1st Normal Form (1NF)

This Normal Form is known for dealing with atomicity i.e., values in the table should not be further divided. In other words, a single row cannot hold the multiple values and must contain the unique value in the table.

Name	Class	Roll No	Subject
A	I	01	Math
B	II	02 03	Eco Sci
C	III	04	Eng

In the above table, we can see clearly that columns Roll No and Subject contain multiple values that violate the 1st NF. Now if we apply the rule of 1st NF, see the table below:

Name	Class	Roll No	Subject
A	I	01	Math
B	II	02	Eco
B	II	03	Sci
C	III	04	Eng

Now after applying the 1st NF, we have received a table containing a unique value in each row.

2nd Normal Form (2NF)

The primary condition of 2nd NF is that it should contain the 1st NF and in addition, it should not contain partial dependency i.e., non-prime attributes should not be dependent on any candidate key’s proper subset.

City_ID	City_Name	Measurement (in KMS)
A1u	New Delhi	500
B5u	NY	700
B8e	LA	800
C9d	Texas	204

{City_ID and City_Name} → Candidate Key

{Measurement (in KMS)} → Non-prime attribute

In the above table, the primary keys are City_ID and City_Name. The non-key attribute is

Measurement (in KMS). In this case the Measurement (in KMS) depends on only the City_Name primary key. This is why the table doesn’t satisfy the 2nd NF.

To apply the 2nd NF in this table, we need to break the table into two parts.

City_ID	City_Name
A1u	New Delhi
B5u	NY
B8e	LA
C9d	Texas

City_Name	Measurement (in KMS)
New Delhi	500
NY	700
LA	800
Texas	204

As you can see we have removed the partial functional dependency that we initially had. Now, in the table, the column Measurement (in KMS) is fully dependent on the primary key (City_Name) of that table.

3rd Normal Form (3NF)

Before proceeding toward 3rd Normal Form, data should contain 2nd NF in the table. In addition to fulfilling the condition of 3rd NF, data should not contain functional transitive dependency for non-prime attributes i.e. There should not be any dependency of any non-prime attributes on other non-prime attributes in the table. So a functional transitive dependency is in which X → Z (X determines Z) indirectly, by sequence of X → Y and Y → Z (where it is not the case that Y → X).

Transaction Date	Transaction_ID	Transaction Amount*	Transaction Tax
01-Jan-23	yghy_y1w_qaw	1200	@18%
02-Jan-23	ubgf_out_opz	7649	@20%
03-Jan-23	lgvn_jtf_cea	6319	@14%

*without tax amount

In the above table, Transaction Date = Primary Key. But, Transaction Amount* → Transaction Tax. This implies that we have a transitive functional dependency, and this structure does not satisfy the third normal form.

Now in order to apply the 3rd NF, we need to divide the table as following:

Transaction Date	Transaction_ID
01-Jan-23	yghy_y1w_qaw
02-Jan-23	ubgf_out_opz
03-Jan-23	lgvn_jtf_cea

Transaction Amount*	Transaction Tax
1200	@18%
7649	@20%
6319	@14%

*without tax amount

As you can see from the above tables all the non-key attributes are now fully functional dependent only on the primary key. Now, Transaction Tax is only dependent on Transaction Amount*.

Boyce Codd Normal Form (BCNF)

Before proceeding for BCNF, the table should be followed by the 3rd NF. If each functional dependency entry shows that X → Y, where X is a Super Key.

Emp_ID	Country	Department	Dept_Type	Dept_No
0121	India	D1	A+	A12
0231	USA	D2	B-	B45
0341	Germany	D3	AA+	H90

Functional Dependencies

Emp_ID → Country
Department → Dept_Type
Department → Dept_No
Emp_ID → Department

{Emp_ID and Department} are the candidate keys.

Emp_ID	Country
0121	India
0231	USA
0341	Germany

In the above table, Emp_ID → Country. This implies that we have a functional dependency, where X → Y and X (Emp_ID) is candidate key.

Department	Dept_Type	Dept_No
D1	A+	A12
D2	B-	B45
D3	AA+	H90

In the above table, Department → Department_Type and Dept_Type → Dept_No. This implies that we have a functional dependency, where X = Y = Z, in which X is candidate key of Y and Y is a candidate key of Z.

Emp_ID	Department
0121	D1
0231	D2
0341	D3

In the above table, Emp_ID → Department. This implies that we have a functional dependency, where X → Y and X (Emp_ID) is candidate key.

LHS of each functional dependency should be a candidate or super key, for the table to be in the BCNF.

Please comment below if you need help with something in the above-discussed topic and have further questions.

Connect on: