codingstreets
Search
Close this search box.
introduction-to-normalization-in-sql

Introduction to Normalization in SQL

Normalization in SQL – Learn about the process of Normalization in SQL and how it helps improve your database’s structure. This article provides an overview of the different types of Normalization, as well as tips on how to use it to optimize your database.

Before moving ahead let’s know a bit about Generalization & Specialization in DBMS.

Table of Contents

What is Normalization?

Normalization is the process of organizing data systematically in a database by reducing the data redundancy through removing the duplicate data. It stores the data into tabular form and removes duplicates to clean up the data and to improve the data integrity in the table.

Important Points:

  1. Normalization has one of the main tasks is to organize the data systematically.
  2. It is mainly in use for reducing data redundancy.
  3. It breaks big data tables into small sets of data and stores them into a table.
  4. It eliminates the duplication of data to improve the data integrity.
  5. It is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.

Why do we need Normalization?

To convert the big relations data into the small set of data relations and to remove the anomalies i.e., insertion, updation and deletion that makes data difficult to read & analyze. To clean-up the data and manage it with fewer attributes to increase the searching efficiency of the data into the database.

Data Anomalies

Insertion

Insertion refers to the process of being unable to insert new data into the database.

Updation

Updation refers to the process of updation of a data required to be updated the multiple rows of data. 

Deletion 

Deletion refers to the process of removing/deletion of data causing loss of other important data from the database.

Key

A Key in the table is defined as the unique column name which helps to identify the record uniquely in the table. It can be one or more combinations of columns.

Department

Dept_Type

Dept_No

D1

A+

A12

D2

B-

B45

D3

AA+

H90

Where Department Column is Primary Key and Dept_No is Foreign Key.

Type of Keys

Primary Key

A Primary Key is the set of one or more column names which identify the records uniquely in the table. It does not contain null and duplicates.

Example:

ID

Roll_No

Name

Subject

D1

11

A

Math

D2

22

A

Eco

D3

33

H

Sci

D4

44

R

Math

D5

55

T

Eco

In the above table, column ID and column Roll_No can be Primary Key because both columns follow the rule of Primary Key but the column Name and Subject cannot be Primary Key as there could be more than one student who is having the same Name and Subject.

Foreign Key

A Foreign key is a field in a table which is a Primary key in another table.

ID

Roll_No

Name

Subject_ID

D1

11

A

23a

D2

22

A

er5

D3

33

H

gs2

D4

44

R

gc2

D5

55

T

ht1

In the above table, the column Subject_ID is Foreign Key.

Subject_ID

Subject_Name

23a

Math

er5

Eng

gs2

Sci

gc2

Hindi

ht1

Geography

But in the above table the same column Subject_ID is a Primary Key.

Super Key

Super Key is the set of one or more than one key that can be used to identify the record uniquely in the table.

Super keys can have extra attributes that are redundant for distinct identification.

ID

Roll_No

Name

Subject

D1

11

A

Math

D2

22

A

Eco

D3

33

H

Sci

D4

44

R

Math

D5

55

T

Eco

Super Keys are given below:

[ID, Name]

[Roll_No, Name]

[Roll_No, Subject]

[ID, Name, Subject]

[Roll_No, Name, Subject]

Candidate Key

A Candidate Key is a subset of a super key set where the key which contains no redundant attribute is none other than a Candidate Key. In other words, Candidate Key is a single key or a group of multiple keys that uniquely identify rows in a table. The value for the Candidate key is unique and non-null for all tuples.

ID

Roll_No

Name

Subject

D1

11

A

Math

D2

22

A

Eco

D3

33

H

Sci

D4

44

R

Math

D5

55

T

Eco

Candidate Keys are given below:

[ID, Roll_No]

[ID, Name]

[Roll_No, Name, Subject]

[ID, Name, Subject]

Types of Normal Forms

1st Normal Form (1NF)

This Normal Form is known for dealing with atomicity i.e., values in the table should not be further divided. In other words, a single row cannot hold the multiple values and must contain the unique value in the table.

Name

Class

Roll No

Subject

A

I

01

Math

B

II

02

03

Eco

Sci

C

III

04

Eng

In the above table, we can see clearly that columns Roll No and Subject contain multiple values that violate the 1st NF. Now if we apply the rule of 1st NF, see the table below:

Name

Class

Roll No

Subject

A

I

01

Math

B

II

02

Eco

B

II

03

Sci

C

III

04

Eng

Now after applying the 1st NF, we have received a table containing a unique value in each row.

2nd Normal Form (2NF)

The primary condition of 2nd NF is that it should contain the 1st NF and in addition, it should not contain partial dependency i.e., non-prime attributes should not be dependent on any candidate key’s proper subset.

City_ID

City_Name

Measurement (in KMS)

A1u

New Delhi

500

B5u

NY

700

B8e

LA

800

C9d

Texas

204

{City_ID and City_Name} → Candidate Key

{Measurement (in KMS)} → Non-prime attribute

In the above table, the primary keys are City_ID and City_Name. The non-key attribute is 

Measurement (in KMS). In this case the Measurement (in KMS) depends on only the City_Name primary key. This is why the table doesn’t satisfy the 2nd NF.

To apply the 2nd NF in this table, we need to break the table into two parts.

City_ID

City_Name

A1u

New Delhi

B5u

NY

B8e

LA

C9d

Texas

 

City_Name

Measurement (in KMS)

New Delhi

500

NY

700

LA

800

Texas

204

As you can see we have removed the partial functional dependency that we initially had. Now, in the table, the column Measurement (in KMS) is fully dependent on the primary key (City_Name) of that table.

3rd Normal Form (3NF)

Before proceeding toward 3rd Normal Form, data should contain 2nd NF in the table. In addition to fulfilling the condition of 3rd NF, data should not contain functional transitive dependency for non-prime attributes i.e. There should not be any dependency of any non-prime attributes on other non-prime attributes in the table. So a functional transitive dependency is in which X → Z (X determines Z) indirectly, by sequence of X → Y and Y → Z (where it is not the case that Y → X).

Transaction Date

Transaction_ID

Transaction Amount*

Transaction Tax

01-Jan-23

yghy_y1w_qaw

1200

@18%

02-Jan-23

ubgf_out_opz

7649

@20%

03-Jan-23

lgvn_jtf_cea

6319

@14%

*without tax amount

In the above table, Transaction Date = Primary Key. But, Transaction Amount* Transaction Tax. This implies that we have a transitive functional dependency, and this structure does not satisfy the third normal form.

Now in order to apply the 3rd NF, we need to divide the table as following:

Transaction Date

Transaction_ID

01-Jan-23

yghy_y1w_qaw

02-Jan-23

ubgf_out_opz

03-Jan-23

lgvn_jtf_cea

Transaction Amount*

Transaction Tax

1200

@18%

7649

@20%

6319

@14%

*without tax amount

As you can see from the above tables all the non-key attributes are now fully functional dependent only on the primary key. Now, Transaction Tax is only dependent on Transaction Amount*.

Boyce Codd Normal Form (BCNF)

Before proceeding for BCNF, the table should be followed by the 3rd NF. If each functional dependency entry shows that X → Y, where X is a Super Key.

Emp_ID

Country

Department

Dept_Type

Dept_No

0121

India

D1

A+

A12

0231

USA

D2

B-

B45

0341

Germany

D3

AA+

 H90

Functional Dependencies

  • Emp_ID → Country
  • Department → Dept_Type
  • Department → Dept_No
  • Emp_ID → Department

{Emp_ID and Department} are the candidate keys.

Emp_ID

Country

0121

India

0231

USA

0341

Germany

In the above table, Emp_ID Country. This implies that we have a functional dependency, where X → Y and X (Emp_ID) is candidate key.

Department

Dept_Type

Dept_No

D1

A+

A12

D2

B-

B45

D3

AA+

H90

In the above table, Department → Department_Type and Dept_Type → Dept_No. This implies that we have a functional dependency, where X = Y = Z, in which X is candidate key of Y and Y is a candidate key of Z. 

Emp_ID

Department

0121

D1

0231

D2

0341

D3

In the above table, Emp_ID Department. This implies that we have a functional dependency, where X → Y and X (Emp_ID) is candidate key.

LHS of each functional dependency should be a candidate or super key, for the table to be in the BCNF.

Please comment below if you need help with something in the above-discussed topic and have further questions.

Connect on:

Recent Post

Popular Post

Top Articles

Archives
Categories

Share on