Introduction of Python Pandas DataFrame Operation

In This Article, You Will Learn About Python Pandas DataFrame Operation.

Table of Contents

Data of Wrong Format

Cells with incorrect data format could cause problems or even impossible to study data.

To correct the issue, you have two options: either delete the rows or convert all the columns’ cells into identical format.

Convert Into a Correct Format

Let’s try to convert all cells in the ‘Date’ column into dates.

Pandas has a method to convert date-data into dates i.e., to_datetime()

Example – Convert to date to right date format.

				
					import pandas as pd

file = pd.read_csv('data.csv')

file['Date'] = pd.to_datetime(file['Date'])

print(file.to_string())

				
			

Removing Rows

The output of the conversion in the previous example gives us a NaT value that can be used as a null value and then we can delete this row making use of dropping the row using dropna() method.

				
					import pandas as pd

file = pd.read_csv('data.csv')

file['Date'] = pd.to_datetime(file['Date'])

file.dropna(subset=['Date'], inplace = True)

print(file.to_string())

				
			

Fixing Wrong Data

Wrong Data

Wrong data means not “empty cell” or “wrong format”, it can be something written in wrong format according to specific rule or sort of instruction.

For example, in our data, you can see it clearly that in row no. 4, date in “NaN” and same is also row no. 6.

Replacing Values

The simple way is to incorrect wrong data is to replace those data with correct value.

Example – Replacing value with new data. Inserted new value 65 in row number 3.

				
					import pandas as pd

file = pd.read_csv('data.csv')

file.loc[3,'Marks'] = 65

print(file.to_string()) 

				
			

Click to Download File

As shown clearly, it returned a file with inserted value 65 at row number 3. 

Smaller data sets can change the incorrect data one at a time; however, it is not so for large data sets.

To replace inaccurate data with more significant data sets, you can establish some regulations, e.g., establish some boundaries for legal values, and then replace any value beyond the bounds.

Example – Loop through all values in the “Marks” column. If the value is higher than 73, set it to 80.

				
					import pandas as pd

df = pd.read_csv('data.csv')

for x in df.index:
  if df.loc[x, "Marks"] > 73:
    df.loc[x, "Marks"] = 80

print(df.to_string())

				
			

Removing Rows

Another method of dealing with incorrect data is to delete those rows that contain wrong data.

That way, you don’t need to figure out which replacements you could use the next time, and there’s an excellent chance that you will not need them to conduct your analysis.

Example – Delete rows where “Marks” is higher than 73.

				
					import pandas as pd

df = pd.read_csv('data.csv')

for x in df.index:
  if df.loc[x, "Marks"] > 73:
    df.drop(x, inplace = True)

print(df.to_string())

				
			

Removing Duplicates

Discovering Duplicates

Duplicates rows are rows that contains same value more than one time.

To discover duplicate values, use duplicated() method.

Duplicated() method  returns a Boolean values for each row.

Example – Returns True for every row that is a duplicate value, otherwise False.

				
					import pandas as pd

file = pd.read_csv('data.csv')

print(file.duplicated())

				
			

Click to Download File

As a result, it returned Ture for each row containing the duplicate value.

To remove duplicate values, use drop_duplicates() method.

Example –  Using drop_duplicates() method to remove duplicate values.

				
					import pandas as pd

file = pd.read_csv('data.csv')

file.drop_duplicates(inplace = True)

print(file.to_string())

				
			

Click to Download File

As shown above, it returned the file after removing duplicate values.

Note: Argument inplace = True will not return a new DataFrame and will remove the all duplicate values from the original DataFrame.

If you find anything incorrect in the above-discussed topic and have any further questions, please comment below.

Leave a Comment

Stay in the loop

codingstreets