In This Article, You Will Learn About how to measure different units and compared their value.
Machine Learning Scale – Before moving ahead, let’s take a look at Machine Learning Multiple Regression.
Table of Contents
It can be hard to compare data with different values and measurement units. How do you compare meters to miles? Or altitude in relation to time?
Scaling is the solution to this problem. Data can be scaled to new values that are more easily compared.
The table below is the same data set as the multiple-regression lesson, but the variable “volume” has been replaced by “Height”, which contains a value of cm.
This file is intended for testing purposes only. Download the file here file.csv
Although it can be difficult for people to compare volume and weight, it is possible to see the difference if they are compared.
There are many ways to scale data. Here, we will be using standardization.
This formula is used in the standardization process:
z = (x – u) / s
Where z is the new value, x is the original value, u is the mean and s is the standard deviation.
Take the Weight column in the above data set (file.csv).
The Python sklearn Module has a method called StandardScaler() that returns a Scaler object with methods to transform data sets.
Example: Scale all values in the Height and Weight columns.
import pandas as pd from sklearn import linear_model from sklearn.preprocessing import StandardScaler scale = StandardScaler() file_read = pd.read_csv('file.csv') # file read X = file_read[["Height", "Weight"]] value_scaled = scale.fit_transform(X) print(value_scaled)
Output - [[-0.31932284 -1.62516393] [-0.3191847 -1.8620631 ] [-0.31020539 0.54077134] [-0.31020539 0.87919872] [-0.31987542 1.11271362] [-0.31089611 0.87919872] [-0.31020539 0.77767051] [-0.31511638 -1.89590583] [-0.30951467 -0.20376891] [-0.31158682 1.18378337] [-0.31365897 0.54077134] [-0.28917308 -0.31883422] [ 0.05615741 -0.32221849] [ 0.28625571 0.7912076 ] [-0.2815752 0.7912076 ] [ 4.1777303 -1.31719501] [ 0.13154909 -0.32221849] [-0.31439113 0.69306366] [-0.31678101 -0.32221849]]
As a result, it returned scaled value of columns “Height” and “Weight.”
Predict ID Values
In the Multiple Regression lesson, the task was to predict an ID from a student if you knew only its weight and Roll_No.
If the data set has been scaled, the scale will be used to predict values.
Example: Predict the ID from a 200cm Height student that Weight 90 kilograms:
import pandas as pd from sklearn import linear_model from sklearn.preprocessing import StandardScaler scale = StandardScaler() file_read = pd.read_csv('file.csv') # file read X = file_read[["Height", "Weight"]] # independent value y = file_read["ID"] # dependent value value_scaled = scale.fit_transform(X) regression = linear_model.LinearRegression() regression.fit(value_scaled, y) # create linear object scaled = scale.transform([[200, 90]]) # predict the ID of a student where the Height is 100, and the Weight is 160: predictIDvalue = regression.predict([[100, 160]]) print(predictIDvalue)
Output - [5978.55633769]
As a result, it prdicted ID of a student based on data of Height and Weight.
If you find anything incorrect in the above-discussed topic and have any further questions, please comment below.