Introduction of Python Pandas DataFrame Operation

October 23, 2021

In This Article, You Will Learn About Python Pandas DataFrame Operation.

Data of Wrong Format

Cells with incorrect data format could cause problems or even impossible to study data.

To correct the issue, you have two options: either delete the rows or convert all the columns’ cells into identical format.

Convert Into a Correct Format

Let’s try to convert all cells in the ‘Date’ column into dates.

Pandas has a method to convert date-data into dates i.e., to_datetime()

Example – Convert to date to right date format.

				
					import pandas as pd

file = pd.read_csv('data.csv')

file['Date'] = pd.to_datetime(file['Date'])

print(file.to_string())

Removing Rows

The output of the conversion in the previous example gives us a NaT value that can be used as a null value and then we can delete this row making use of dropping the row using dropna() method.

				
					import pandas as pd

file = pd.read_csv('data.csv')

file['Date'] = pd.to_datetime(file['Date'])

file.dropna(subset=['Date'], inplace = True)

print(file.to_string())

Fixing Wrong Data

Wrong Data

Wrong data means not “empty cell” or “wrong format”, it can be something written in wrong format according to specific rule or sort of instruction.

For example, in our data, you can see it clearly that in row no. 4, date in “NaN” and same is also row no. 6.

Replacing Values

The simple way is to incorrect wrong data is to replace those data with correct value.

Example – Replacing value with new data. Inserted new value 65 in row number 3.

				
					import pandas as pd

file = pd.read_csv('data.csv')

file.loc[3,'Marks'] = 65

print(file.to_string())

Click to Download File

As shown clearly, it returned a file with inserted value 65 at row number 3.

Smaller data sets can change the incorrect data one at a time; however, it is not so for large data sets.

To replace inaccurate data with more significant data sets, you can establish some regulations, e.g., establish some boundaries for legal values, and then replace any value beyond the bounds.

Example – Loop through all values in the “Marks” column. If the value is higher than 73, set it to 80.

				
					import pandas as pd

df = pd.read_csv('data.csv')

for x in df.index:
  if df.loc[x, "Marks"] > 73:
    df.loc[x, "Marks"] = 80

print(df.to_string())

Removing Rows

Another method of dealing with incorrect data is to delete those rows that contain wrong data.

That way, you don’t need to figure out which replacements you could use the next time, and there’s an excellent chance that you will not need them to conduct your analysis.

Example – Delete rows where “Marks” is higher than 73.

				
					import pandas as pd

df = pd.read_csv('data.csv')

for x in df.index:
  if df.loc[x, "Marks"] > 73:
    df.drop(x, inplace = True)

print(df.to_string())

Removing Duplicates

Discovering Duplicates

Duplicates rows are rows that contains same value more than one time.

To discover duplicate values, use duplicated() method.

Duplicated() method returns a Boolean values for each row.

Example – Returns True for every row that is a duplicate value, otherwise False.

				
					import pandas as pd

file = pd.read_csv('data.csv')

print(file.duplicated())

Click to Download File

As a result, it returned Ture for each row containing the duplicate value.

To remove duplicate values, use drop_duplicates() method.

Example – Using drop_duplicates() method to remove duplicate values.

				
					import pandas as pd

file = pd.read_csv('data.csv')

file.drop_duplicates(inplace = True)

print(file.to_string())

Click to Download File

As shown above, it returned the file after removing duplicate values.

Note: Argument inplace = True will not return a new DataFrame and will remove the all duplicate values from the original DataFrame.

If you find anything incorrect in the above-discussed topic and have any further questions, please comment below.

Like us on

In This Article, You Will Learn About Python Pandas DataFrame Operation.

Table of Contents

Data of Wrong Format

Convert Into a Correct Format

Removing Rows

Fixing Wrong Data

Wrong Data

Replacing Values

Removing Rows

Removing Duplicates

Discovering Duplicates

Recent Post

Python Conditional Statements (if else) Project for Beginners

Google Dialogflow Chatbot – Renew Subscription Plan

How to Create a Google Dialogflow Chatbot – Solved Hosting Login Issues?

How to create a Chatbot with Google Dialogflow?

Google Dialogflow Chatbot Tutorial for Beginner

Popular Post

Get Started: SQL NULL Function

Introduction to Python Numpy Data Types

Introduction to Numpy Logistic Distribution

Get Started: SQL Null Values

Introduction to Python Set Methods with Practical Examples

Top Articles

Introduction to Evaluation of Infix/Postfix Expression

Python Program: Reverse a list of numbers

Top Five IDEs in 2022

Machine Learning: Types of Machine Learning

Top 10 Open-Source Python Libraries In 2022

Archives

Categories

Subscribe to our newsletter

Useful Links

Get Started

About

In This Article, You Will Learn About Python Pandas DataFrame Operation.

Table of Contents

Data of Wrong Format

Convert Into a Correct Format

Removing Rows

Fixing Wrong Data

Wrong Data

Replacing Values

Removing Rows

Removing Duplicates

Discovering Duplicates

Recent Post

Popular Post

Top Articles

Archives

Categories

Tags

Subscribe to our newsletter

Useful Links

Get Started

About